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Abstract 


Tractable  Quantification  of  Metastability 
for  Robust  Bipedal  Locomotion 

by 

Cenk  Oguz  Saglam 

This  work  develops  tools  to  quantify  and  optimize  performance  metrics  for  bipedal 
walking,  toward  enabling  improved  practical  and  autonomous  operation  of  two-legged 
robots  in  real-world  environments.  While  speed  and  energy  efficiency  of  legged  locomo¬ 
tion  are  both  useful  and  straightforward  to  quantify,  measuring  robustness  is  arguably 
more  challenging  and  at  least  as  critical  for  obtaining  practical  autonomy  in  variable  or 
otherwise  uncertain  environmental  conditions,  including  rough  terrain.  The  intuitive  and 
meaningful  robustness  quantification  adopted  in  this  thesis  begins  by  stochastic  model¬ 
ing  of  disturbances  such  as  terrain  variations,  and  conservatively  defining  what  a  failure 
is,  for  example  falling  down,  slippage,  scuffing,  stance  foot  rotation,  or  a  combination  of 
such  events.  After  discretizing  the  disturbance  and  state  sets  by  meshing,  step-to-step 
dynamics  are  studied  to  treat  the  system  as  a  Markov  chain.  Then,  failure  rates  can  be 
easily  quantified  by  calculating  the  expected  number  of  steps  before  failure.  Once  robust¬ 
ness  is  measured,  other  performance  metrics  can  also  be  easily  incorporated  into  the  cost 
function  for  optimization. 

For  high  performance  and  autonomous  operation  under  variations,  we  adopt  a  ca¬ 
pacious  framework,  exploiting  a  hierarchical  control  structure.  The  low-level  controllers, 
which  use  only  proprioceptive  (internal  state)  information,  are  optimized  by  a  derivative- 
free  method  without  any  constraints.  For  practicability  of  this  process,  developing  an 
algorithm  for  fast  and  accurate  computation  of  our  robustness  metric  was  a  crucial  and 
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necessary  step.  While  the  outcome  of  optimization  depends  on  capabilities  of  the  con¬ 
troller  scheme  employed,  the  convenient  and  time-invariant  parameterization  presented 
in  this  thesis  ensures  accommodating  large  terrain  variations.  In  addition,  given  environ¬ 
ment  estimation  and  state  information,  the  high-level  control  is  a  behavioral  policy  to 
choose  the  right  low-level  controller  at  each  step.  In  this  thesis,  optimal  switching  policies 
are  determined  by  applying  dynamic  programming  tools  on  Markov  decision  processes 
obtained  through  discretization.  For  desirable  performance  in  practice  from  policies  that 
are  formed  using  meshing-based  approximation  to  the  true  dynamics,  robustness  of  high- 
level  control  to  environment  estimation  and  discretization  errors  are  ensured  by  modeling 
stochastic  noise  in  the  terrain  information  and  belief  state  while  solving  for  behavioral 
policies. 
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Chapter  1 


Introduction 


Over  the  past  few  decades,  a  variety  of  control  methods  have  made  robots  quite  reliable 
in  deterministic  factory  settings.  One  of  the  central  challenges  in  robotics  today  is  to 
attain  robust  performance  under  the  variations1  implied  by  real-world  conditions.  To 
achieve  robustness  for  reliable  operation  in  less-structured  environments,  quantification 
and  optimization  of  relevant  performance  metrics  are  essential  tasks.  Ideally,  a  robot 
should  also  utilize  information  about  the  environment  and  modify  its  motion  accordingly 
to  maximize  stability  and  autonomy. 

Although  the  methods  presented  in  this  thesis  are  potentially  applicable  to  autonomous 
wheeled  vehicles,  robotic  manipulators,  and  a  broader  class  of  hybrid  dynamical  systems 
in  general,  the  focus  is  on  legged  locomotion.  In  particular,  we  study  two-legged  under¬ 
actuated  robots  walking  on  rough  terrain,  for  which  measuring  and  achieving  stability  is 
still  a  major  challenge. 

While  mobility  is  an  undeniable  necessity  for  various  robotic  applications,  one  motiva¬ 
tion  for  bipedal  robot  walking  is  that  this  anthropomorphic  approach  provides  powerful 

1  As  explained  by  [Smith,  2012],  variations  can  be  categorized  as  variability,  which  means  “natural 
variation  in  some  quantity”,  and  uncertainty,  which  refers  to  “the  degree  of  precision  with  which  a 
quantity  is  measured”  [Belle,  2008]. 
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means  for  negotiating  intermittent  or  otherwise  rough  terrain,  where  wheels  would  be  in¬ 
effective.  Bipeds  have  the  capability  of  moving  on  steep  slopes,  climbing  ladders,  varying 
their  step  width,  passing  narrow  passages,  easily  turning  corners,  walking  on  a  tightrope, 
avoiding  obstacles,  and  traversing  with  small  footprints.  Developments  in  bipedal  locomo¬ 
tion  research  will  enable  replacing  humans  in  hazardous  jobs,  assisting  people  in  difficult 
or  time  consuming  tasks,  providing  better  prostheses  for  the  disabled,  and  rehabilitating 
the  injured. 

1.1  Stability  Measurement  for  Bipedal  Robots 

Designing  and  comparing  various  robotic  hardware  and/or  control  schemes  require 
evaluation  of  performance  using  appropriate  metrics.  In  particular,  measuring  success  of 
legged  locomotion  is  essential  toward  developing  more  capable  bipedal  robots.  Although 
the  performance  of  highly-dynamic  walkers  is  often  quantified  by  speed  or  energy  con¬ 
sumption  [Hobbclcn  and  Wisse,  2007],  these  qualities  are  tangentially  meaningful  if  the 
robot  is  likely  to  fall  in  several  steps.  For  reliable  operation,  the  stability  of  walking  must 
be  measured  and  increased.  Once  a  robot  can  walk  in  an  unlikely-to-fall  manner,  other 
metrics  such  as  speed  and  energy  efficiency  can  be  easily  incorporated.  As  an  example  ap¬ 
plication,  control  can  be  optimized  to  maximize  the  probability  of  traversing  a  particular 
terrain  with  a  desired  speed  and  given  limited  energy  capability. 

There  are  two  mainstream  approaches  to  bipedal  robot  design.  One  approach  relics 
heavily  on  ankle  torque  for  balance.  By  using  an  appropriately  large  foot  as  a  base  of 
support,  such  robots  can  be  model  as  fully  actuated  under  carefully  controlled  operating 
conditions.  The  other  approach  deliberately  uses  an  underactuated  ankle.  Without  ankle 
torque,  balance  control  requires  more  careful  planning,  but  it  also  enables  more  dynamic 
gaits,  with  smaller  footholds,  better  energy  efficiency,  and  potentially  greater  capabilities 
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in  regimes  where  large  perturbations  to  a  robot  would  cause  either  approach  to  become 
underactuated  at  the  ground  contact.  The  differences  in  these  two  approaches’  design 
principles  lead  to  qualitatively  distinct  walking  motions  and  numerous  metrics  have  been 
proposed  to  quantify  their  stability  [Pratt  et  ah,  2001],  [Santos  et  ah,  2007],  [Hobbelcn 
and  Wisse,  2007],  [Byl,  2008],  some  of  which  are  summarized  next. 

1.1.1  Flat  Footed  Humanoids 

Many  well  known  and  very  capable  humanoids,  including  the  robots  shown  in  Fig¬ 
ure  1.1,  are  designed  to  have  large  feet  to  make  locomotion  task  relatively  easier.  Although 
humans  roll  their  feet  while  walking  for  energy  efficiency,  these  humanoids  are  often  con¬ 
trolled  aiming  to  keep  at  least  one  foot  flat  on  the  ground  at  all  times.  In  other  words, 
the  objective  is  to  prevent  the  foot  on  the  ground  from  rotating  to  model  robot  as  fully 
actuated.  Multiple  methods  are  proposed  to  achieve  such  a  balance.  As  [Sugihara,  2009] 
explains,  these  balance  criteria  are  often  used  to  show  stability.  However,  we  will  later 
argue  that  distinguishing  the  two  is  an  important  step  toward  human-like  walking  robots. 

The  most  elementary  approach  to  balancing  is  ignoring  the  kinematics  and  dynamics 
of  the  system  and  modeling  the  robot  as  a  mass  with  a  support  polygon2  as  shown 
in  Figure  1.2.  In  case  the  projection  of  this  mass’  location  to  the  ground  surface  is 
within  the  support  polygon,  the  robot  is  statically  stable.  Although  this  approach  is 
appealing  because  of  its  simplicity,  given  the  dynamics  of  a  walker,  static  stability  is 
neither  necessary  nor  sufficient  condition  for  keeping  the  foot  flat  or  walking  stably. 

Static  stability  margin  concept  has  been  modified  in  various  ways  to  consider  the 
kinematics  and  dynamics  of  the  robot.  One  extension,  which  has  lead  to  the  most  popular 
stability  metric  overall  today,  focuses  on  the  center  of  pressure  (CoP)  on  the  foot  instead 

2Convex  hull  formed  by  the  contacts  between  the  feet  and  the  ground  [Tedrake,  2004].  In  the  2D  case, 
support  polygon  is  a  line,  e.g.,  AB  in  Figure  1.2. 
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(a)  ASIMO 


(d)  SURALP 


(b)  ATLAS 


(e)  NAO 


(c)  HUBO 


(f)  Darwin 


Figure  1.1:  Some  state-of-the-art  bipedal  robots  with  large  feet.  ASIMO,  ATLAS, 
HUBO  and  SURALP  are  human  sized  robots  built  for  research  purposes.  NAO  and 
Darwin  are  smaller  bipeds  which  are  commercially  available. 

ASIMO  and  ATLAS  images  are  permitted  for  non- commercial  use  by  American  Honda  Motor  Com¬ 
pany  and  The  Defense  Advanced  Research  Projects  Agency  (DARPA)  respectively.  HUBO  is  manufac¬ 
tured  by  Korea  Advanced  Institute  of  Science  and  Technology  (KAIST)  and  its  image  is  taken  from 
http://dailyscene.com/wp-content/uploads/2011/12/Hubo.jpg.  Prof.  Kemalettin  Erbatur  from  Sabanci  Uni¬ 
versity  of  Turkey  kindly  permitted  the  use  of  SURALP  image.  Aldebaran  Robotics  is  the  manufacturer  of 
NAO  and  its  image  is  obtained  from  http://hcr.mines.edu/images/NaoStand.png.  Use  of  DARWIN  image  is 
permitted  by  ROBOTIS  Incorporation. 
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CoM  ^ 

0 

A  PB 

(a)  A  statically  stable  object.  Static  stability  mar¬ 
gin  is  given  by  min(PA,  PB),  the  minimum  dis¬ 
tance  from  P  to  the  edges  of  the  support  polygon. 


(b)  A  statically  unstable  object.  Static  stability 
margin  is  given  by  -PB,  minus  the  minimum  dis¬ 
tance  from  P  to  the  edges  of  the  support  polygon. 


Figure  1.2:  Illustration  of  static  stability  in  2D.  CoM  denotes  the  center  of  mass. 
P  shows  the  projection  of  the  CoM.  AB,  the  line  segment  formed  by  points  A  and 
B,  corresponds  to  the  support  polygon.  If  P  is  in  (outside)  AB,  then  the  object  is 
statically  stable  (unstable). 


of  the  center  of  mass  location  [Vukobratovic  and  Borovac,  2004],  As  shown  in  Figure  1.3, 
if  the  CoP  is  strictly  inside  the  footprint,  then  the  foot  does  not  rotate  and  the  robot  is 
balanced.  In  this  case  CoP  is  equivalent  to  the  zero  moment  point  (ZMP). 


(a)  ZMP  criterion  is  met  (center  of  pressure  is  (b)  The  foot  is  rotated  and  the  robot  touches  the 
strictly  within  footprint).  As  a  result,  the  foot  does  ground  only  at  a  point.  Then  this  point  is  where 
not  rotate.  the  ZMP  acts. 

Figure  1.3:  Zero  moment  point  criterion.  P  denotes  the  center  of  pressure  acting  on 
the  foot.  If  P  is  strictly  within  footprint,  then  the  ZMP  criterion  is  satisfied  and  the 
foot  does  not  rotate.  ZMP  based  controllers  aim  at  keeping  at  least  one  foot  flat  on 
the  ground. 
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[Goswami  and  Kallem,  2004]  generalizes  ZMP  concept  by  proposing  a  criterion  based 
on  angular  momentum  to  preserve  the  balance.  Algorithms  based  on  linear  and  angular 
momentum  have  shown  to  be  useful  in  whole  body  motion  planning  and  control  [Ugurlu 
and  Kawamura,  2012]  [Orin  et  ah,  2013]  [Kajita  et  ah,  2003]. 

Although  humanoids  are  typically  controlled  to  preserve  balance  to  establish  stability, 
one  idea  in  this  thesis  is  that  stability  is  different  than  balance.  In  human-like  walking, 
each  step  is  basically  an  intentional  “fall”  onto  the  next  foot.  Avoiding  underactuation 
for  bipeds  often  results  in  energy  inefficiency.  Indeed,  the  cost  of  transport3  (CoT)  is 
estimated  to  be  0.2  for  humans  and  3.2  for  the  infamous  humanoid  ASIMO  shown  in 
Figure  1.1a  [Collins  et  ah,  2005].  On  the  other  hand,  [Kuo,  2007]  explains  that  serious 
advances  in  control  are  necessary  to  achieve  the  high  versatility  of  robots  like  ASIMO 
while  being  as  energy  efficient  as  the  walkers  of  next  section. 

1.1.2  Dynamic  Walkers 

Toward  developing  more  energy  efficient,  dynamic,  fast,  agile  and  humanlike  walk¬ 
ing  robots,  Tad  McGeer  has  been  inspirational  by  showing  that  even  passive  bipeds 
(two-legged  walkers  that  have  no  motors)  can  walk  downhill  in  a  stable  manner  using 
gravitational  forces  [McGeer,  1990].  A  passive  robot  built  by  [Collins  et  al.,  2001]  is  de¬ 
picted  in  Figure  1.4a.  A  key  and  revolutionary  aspect  of  these  machines  is  that  part  of 
their  walking  cycle  is  unbalanced,  but  the  overall  walking  motion  is  stable.  This  prop¬ 
erty,  which  is  ambiguously  termed  as  dynamic  walking ,  has  pioneered  a  major  trend  in 
bipedal  locomotion  research,  where  one  of  the  main  goals  is  exploiting  underactuation4 
like  humans  rather  than  avoiding  it  as  in  flat  footed  walking. 

3The  non-dimensionalized  energy  expenditure  per  unit  weight  and  unit  distance  [Tucker,  1975]. 

4lack  of  ability  to  control  all  degrees-of- freedom.  Approximately,  the  robot  in  Figure  1.3a  is  fully 
actuated,  but  it  is  underactuated  in  Figure  1.3b. 
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Inspired  by  passive  walkers,  researchers  have  demonstrated  a  range  of  powered  walk¬ 
ers  based  on  exploiting  natural  dynamics,  and  the  approach  has  led  to  record-breaking 
performance  in  walking  with  only  onboard  power  (energetically  autonomous)  [Bhounsule 
et  al.,  2014],  Figure  1.4b  illustrates  the  Cornell  powered  biped  designed  by  [Collins  and 
Ruina,  2005],  which  is  as  energy  efficient  as  humans  [Collins  et  ah,  2005]. 

Although  these  walkers  have  been  groundbreaking,  robots  that  are  designed  dom¬ 
inantly  for  energy  efficiency  are  typically  sensitive  to  perturbations,  thus  they  do  not 
perform  as  well  as  flat  footed  robots  like  ASIMO  on  rough  terrain.  To  close  the  gap, 
various  dynamic  walkers  have  been  designed  for  better  rough  terrain  capabilities,  three 
of  which  are  depicted  in  the  bottom  row  of  Figure  1.4.  RABBIT  has  been  a  prominent 
test  bed  for  theory  both  in  simulation  and  experiments  [Chevallereau  et  ah,  2003].  This 
robot  was  later  followed  by  MABEL,  which  proved  to  be  a  much  more  capable  hardware 
on  rough  terrain  [Grizzle  et  ah,  2009].  Using  booms,  these  two  robots  were  constrained 
to  walk  in  2  dimensional  space  to  tackle  the  problem  of  understanding  underactuated 
walking.  A  more  recent  point-footed  walker  is  ATRIAS  [Hurst,  2015a],  which  very  re¬ 
cently  showed  the  ability  to  walk  in  3D  [Hurst,  2015b].  Notice  that  all  three  robots  are 
underactuated  because  they  have  point  feet  to  emulate  the  foot  rolling  in  human  loco¬ 
motion.  Once  the  relatively  more  complicated  task  of  robust  walking  with  no  real  feet  is 
achieved,  the  addition  of  complex  feet  and  ankle  torque  only  helps  toward  making  the 
robot  more  stable  and  capable. 

Under  deterministic  conditions,  dynamic  bipeds  exhibit  locally  stable  limit  cycles 
that  repeat  once  per  step.  The  stability  under  disturbances  is  then  often  studied  by  local 
approximations  on  these  limit  cycles  by  investigating  deviations  from  the  trajectories 
(gait  sensitivity  norm  [Hobbclen  and  Wisse,  2007],  H0 0  cost  [Morimioto  et  ah,  2003],  and 
L2  gain  [Dai  and  Tedrake,  2013]),  or  the  speed  of  convergence  back  after  such  deviations 
(using  Floquet  theory  [Hurmuzlu  and  Basdogan,  1994],  [McGeer,  1990]). 
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(a)  Passive  walker 


(b)  Cornell  powered  biped 


(c)  RABBIT  (d)  MABEL  (e)  ATRIAS 

Figure  1.4:  Some  robots  that  do  not  maintain  static  stability  while  exhibiting  stable 
walking  motion  on  deterministic  terrain. 

Professor  Andy  Ruina  kindly  allowed  the  use  of  the  passive  walker  and  Cornell  powered  biped  images.  RAB¬ 
BIT  was  designed  and  constructed  at  the  Control  Department  of  GIPSA-Lab  (CNRS),  at  Grenoble  France. 
Professor  Carlos  Canudas  de  Wit,  Professor  Jessy  Grizzle,  and  Professor  Jonathan  Hurst  permitted  using 
the  images  of  RABBIT,  MABEL,  and  ATRIAS  respectively. 
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Alternative,  the  largest  single  disturbance  that  the  robot  can  handle  without  falling 
can  be  measured  deterministically  [Pratt  et  ah,  2001],  [Hobbclcn  and  Wisse,  2007], 
[McGeer,  1990].  While  this  simple  idea  provides  a  good  intuition  for  stability,  we  be¬ 
lieve  the  expected  number  of  steps  before  failure  metric,  described  in  the  next  section,  is 
more  suitable  for  controller  optimization,  and  the  process  in  obtaining  it  also  provides 
valuable  information  to  be  exploited  in  Section  1.2. 

1.1.3  Expected  Number  of  Steps  Before  Failure 

This  thesis  adopts  and  extends  a  very  intuitive  and  meaningful  stability  metric  intro¬ 
duced  by  [Byl,  2008],  which  has  been  applied  to  dynamic  walkers.  It  is  also  potentially 
applicable  to  flat  footed  humanoids  among  many  other  hybrid  dynamical  systems.  [Byl, 
2008]  presented  a  methodology  to  calculate  the  expected  number  of  steps  before  failure 
under  stochastic  disturbances,  e.g.,  they  assumed  slopes  ahead  of  the  robot  generate  a 
Gaussian  distribution.  This  analysis  can  be  interpreted  either  as  estimation  of  failure 
rates  under  disturbances  or  verification  of  robustness  to  these  perturbations,  depending 
on  the  application. 

The  method,  which  is  explained  in  detail  in  Chapter  2,  was  originally  illustrated 
on  two  and  four  dimensional  systems  [Byl  and  Tedrake,  2009].  Later  [Chen  and  Byl, 
2012]  applied  this  machinery  to  a  six  dimensional  walker.  One  of  the  contributions  of 
this  thesis  is  avoiding  the  curse  of  dimensionality  to  show  the  applicability  of  this  tool 
to  even  higher  dimensional  systems  by  illustrating  on  a  robot  with  a  10D  state  space. 
The  results  indicate  a  clear  promise  in  applications  to  even  higher  degrees-of-freedom 
humanoids  with  complex  feet  walking  in  3D. 

In  addition  to  calculating  expected  number  of  steps  before  failure,  we  also  extend 
the  framework  to  calculate  other  performance  measures  under  stochastic  conditions,  e.g., 
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expected  speed  or  energy  consumption  per  step  under  stochastic  conditions,  which  are 
different  from  the  values  associated  with  the  limit  cycle  motion  in  deterministic  environ¬ 
ments.  We  can  also  compute  metrics  like  expected  distance  or  (continuous)  time  before 
failure. 


1.2  Autonomous  Walking 

To  measure  performance,  the  framework  mentioned  in  Section  1.1.3  works  by  learning 
the  dynamics  that  govern  the  walking  system.  This  process  also  provides  convenient 
means  for  designing  high-level  behavioral  algorithms  that  utilize  state  information  and 
optional  estimation  of  environmental  parameters.  Adopting  such  a  hierarchical  control 
structure  does  not  only  increase  the  stability  dramatically,  but  it  also  brings  autonomy 
to  the  robot.  Exploiting  this  capability  is  one  of  the  main  contributions  of  this  thesis. 

High-level  control  proved  to  be  greatly  useful  experimentally  in  [Park  et  al.,  2012], 
where  a  simple  policy  was  employed:  If  the  last  step  experienced  a  step-down  of  more 
than  3  cm,  “shock  absorbing  controller”  was  used.  Otherwise  the  “baseline  controller”  was 
activated.  On  the  other  hand,  our  framework  provides  a  systemic  way  of  obtaining  more 
complex  and  optimal  policies  in  more  general  scenarios.  These  behavioral  algorithms  can 
also  optionally  use  look-ahead  information  regarding  the  terrain  when  available. 

Robots  can  alternatively  utilize  information  about  their  environment  by  kinodynamic 
motion  planning  technique  proposed  in  [Donald  et  ah,  1993].  In  comparison,  our  approach 
falls  broadly  into  the  machine  learning  class  instead.  Once  the  high-level  control  policy 
is  obtained  off-line,  the  only  on-line  calculation  is  to  use  this  precomputed  look-up  ta¬ 
ble  at  each  step  to  pick  the  appropriate  low-level  controller,  which  makes  the  approach 
compatible  with  dynamic  walking. 
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1.3  Organization  of  Thesis 

By  focusing  on  a  toy  example,  the  next  chapter  explains  the  mathematical  tools  upon 
which  the  rest  of  this  thesis  builds  on.  The  potential  curse  of  dimensionality  problem  in 
using  this  tool  is  avoided  in  Chapter  3,  where  a  powerful  meshing  method  is  introduced. 
Chapter  4  optimizes  and  benchmarks  control  action  using  the  stability  metric  mentioned 
in  Section  1.1.3.  In  addition,  Chapter  5  adopts  a  hierarchical  control  structure  to  increase 
the  performance  even  more  dramatically  by  optionally  using  the  environmental  informa¬ 
tion  for  autonomous  operation.  Conclusions  and  potential  future  work  are  presented  in 
Chapter  6. 
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Metastable  Walking 


Metastable  systems  can  be  natural  or  human-made.  They  exist  in  a  precarious  state 
of  stability,  appearing  to  be  locally  stable  for  long  periods  of  time  until  an  external 
disturbance  perturbs  the  system  into  a  region  of  state  space  with  a  qualitatively  different 
local  behavior.  Since  these  systems  are  guaranteed  to  escape  such  locally  well-behaved 
regions  with  probability  one  given  enough  time,  they  cannot  be  classified  as  “stable”,  but 
it  is  also  misleading  to  categorize  them  simply  as  “unstable” .  Physicists  have  explored  this 
phenomenon  in  detail  and  have  developed  a  number  of  tools  for  quantifying  metastable 
behaviors  [Talkner  et  ah,  1987],  [Hanggi  et  ah,  1990],  [Muller  et  ah,  1997],  [Kampen, 
2011].  Metastable  processes  have  been  observed  in  many  other  branches  of  science  and 
engineering  including  familiar  systems  such  as  crystalline  structures  [Larsen  and  Grier, 
1997],  flip-flops  [Veendrick,  1980],  and  neuroscience  [Fingelkurts  and  Fingelkurts,  2004], 
While  focusing  on  rough  terrain  walking,  this  thesis  aims  to  deal  with  any  metastablc 
system  for  which  escape  (fail  or  success)  is  guaranteed  due  to  the  variations  in  its  environ¬ 
ment.  To  elaborate,  consider  the  energy  profile  in  Figure  2.1.  Assume  we  start  in  state-M 
and  the  probability  of  moving  to  state-T  goes  to  1  in  time  due  to  the  disturbances  acting 
in  the  system.  Let’s  name  state-T  as  transition  state,  from  which  we  either  go  back  to 
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state-M  or  fall  to  a  stable  lower  energy  state,  namely  state-S.  In  this  setting,  state-M  is 
not  stable  in  the  strict  sense  because  disturbances  may  result  in  moving  to  state-S.  How¬ 
ever,  if  the  transitions  from  state-M  to  state-T  are  rare,  it  is  also  misleading  to  categorize 
state-M  simply  as  unstable.  Since  it  is  “long-living,  but  destined  to  eventually  end”  [Byl, 
2008],  we  classify  state-M  metastable. 


Figure  2.1:  Cartoon  explaining  metastability.  Under  deterministic  conditions,  state-M 
is  a  locally-stable  equilibria  in  a  potential.  However,  with  sufficient  noise  in  the  model, 
the  particle  is  guaranteed  to  transition  to  lower  energy  state,  namely  state-S.  If  these 
guaranteed  transitions  are  extremely  rare,  states-M  is  metastable. 

Figure  is  inspired  from  “Meta- stability"  by  Georg  Wiora  licensed  under  GFDL.  See  [Benallegue  and  Laumond, 
2013]  for  an  inspiring  interpretation  of  metastable  walking. 


Figure  2.2  shows  how  states  depicted  in  Figure  2.1  look  in  human  walking.  The 
key  point  is  to  realize  that  even  humans  fail  in  walking  from  time  to  time  for  various 
reasons.  So,  walking  is  the  metastable  state  in  bipedal  locomotion,  and  the  transition  state 
represents  staggering  or  stumbling.  In  reality,  failure  to  walk  is  not  absorbing  because 
humans  and  robots  get  up  after  failing.  It  becomes  clear  later  in  the  text  how,  for  our 
purposes,  modeling  the  failure  as  a  stable  state  does  not  ignore  the  ability  to  start  walking 
again. 
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(a)  Failure  in  human  walking  due  to  slippage 


(b)  Failure  in  human  walking  due  to  ground  being  lower  than  expected 


(c)  Failure  in  human  walking  due  to  tripping 


Figure  2.2:  Some  common  failure  reasons  in  human  walking.  State-M,  State-T  and 
State-S  correspond  to  walking,  stumbling  and  failure  as  depicted  in  Figure  2.1 

Images  are  take  from  the  following  YouTube  vidoes:  https://www. youtube. com/watch?v=aQ99VULQRIf, 
https: / / www. youtube. com/watch ?v=oCEZRWZqX9w,  and  https: / /www. youtube. com/watch ?v=R_B  1  PkgA 3k A . 
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Before  proceeding  to  mathematical  formulation  of  the  framework,  we  would  like  to 
clarify  the  notation  of  state.  In  the  representative  image  of  Figure  2.1,  walking  is  a  state. 
However,  the  state  of  the  robot  is  actually  a  numeric  value  denoted  by  x  which  changes 
while  walking.  The  goal  of  this  chapter  is  to  represent  a  walking  system  in  the  format 
shown  in  Figure  2.1. 


2.1  Hybrid  Model 


Let  x ,  7,  and  (  be  the  internal  state,  the  randomness  system  experiences,  and  the 
control  action  respectively.  To  illustrate,  for  a  walking  robot,  x  is  the  robot’s  state,  7  is 
the  random  variable  representing  factors  such  as  terrain  variation  and  system  noise,  and 
(  is  the  control  action  which  may  be  a  function  of  x  and  7.  Define  vector  y  :=  [a;;  7;  £]  to 
represent  them  all.  Then,  the  general  hybrid  model  is  represented  as 


y  =  f(y )  y  e  c, 

y+  =  g(y)  y  e  D. 


(2.1) 


C  and  D  are  flow  and  jump  sets  [Goebel  et  ah,  2012],  This  setting  is  compatible  with 
less  general  cases  like  continuous  and  discrete  systems  with/without  a  control  action  or 
randomness. 


2.2  Discretization  for  State  Machine  Representation 

The  first  step  in  discretization  precess  is  choosing  a  Poincare-like  section,  noted  by 
S,  such  that  if  the  system  has  not  failed  yet,  it  needs  to  keep  passing  through  this 
section.  For  example,  the  hybrid  dynamics  of  walking  systems  are  punctuated  by  discrete 
impacts  when  a  foot  comes  into  contact  with  the  ground.  These  impacts  provide  a  natural 
discretization  of  the  robot  motion. 
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Abuse  the  notation  by  letting  x  refer  to  the  state  when  y  e  S.  Then,  the  next  state  is 
a  function  of  the  current  state  x [n] ,  the  randomness  experienced  7  [n],  and  the  controller 
action  during  that  step  £[n],  that  is 

x[n  +  1]  =  p(x[n\,j[n\,([n}).  (2.2) 

To  obtain  a  (discrete)  Markov  decision  process  model,  finite  sets  for  control  action, 
randomness,  and  state  are  needed.  The  first  one  is  rather  easy,  finitely  many  low-level 
controllers  are  designed  which  form  the  controller  set  Z.  The  second  (randomness), 
is  straightforward  to  handle  when  the  number  of  noise  sources  is  low.  For  instance, 
{0,  yd-j-,  ^  2}  is  a  uniform  discretization  of  [0, 1]  with  k  elements.  In  this  thesis  we 

study  1  dimensional  disturbances  and  the  density  of  the  randomness  set  T  is  a  function 
of  parameter  d7  given  by 

r=|7:  J-  e  Z,  7/  <  7  <  7rj  ,  (2.3) 

where  7 1  and  7r  determine  the  range  of  randomness  set,  which  needs  to  be  wide  enough 
to  model  the  terrain  of  interest.  Having  a  wider  than  needed  randomness  range  has  no 
disadvantage,  but  too  narrow  range  causes  inaccuracy.  In  particular,  the  robot  should 
not  be  able  to  walk  at  the  boundaries  of  the  randomness  set,  otherwise  we  extend  the 
range. 

Just  like  range,  density  of  the  randomness  set  can  be  chosen  depending  on  the  con¬ 
trollers’  performance,  and  on  the  robot.  Also,  the  randomness  set  does  not  have  to  be 
evenly  spaced  in  general,  it  may  be  denser  around  values  of  particular  interest.  As  we 
increase  the  density  of  the  randomness  set,  we  are  able  to  capture  the  dynamics  more 
accurately  at  the  expense  of  higher  computational  costs. 

If  the  internal  state  x  is  also  low-dimensional,  the  entire  state  space  can  potentially  be 
uniformly  meshed  to  obtain  state  set  X.  However,  as  dimensionality  increases  this  method 

becomes  intractable.  This  potential  curse  of  dimensionality  is  handled  in  Chapter  3. 
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Once  control,  randomness  and  state  spaces  are  represented  by  finite  sets,  we  simulate 
p(x,  7,C)  for  each  x  G  X,  7  G  T,  and  (  G  Z  to  obtain  the  state-transition  map,  which 
gives  deterministic  information  about  the  dynamics.  In  case  the  resulting  point  is  not 
in  the  state  set,  i.e.,  p(x,  7,  £)  ^  X,  we  need  to  approximate  the  dynamics.  The  most 
elementary  approach  is  O’th  order  approximation  given  by 

x[n  +  l]  «  c(p(x[n],i[n],([n]),X),  (2.4) 

where  c(x,  X)  is  the  closest  point  x  G  X  to  x  for  the  employed  distance  metric.  Then  the 
deterministic  state  transition  matrix  can  be  written  as 

^(7.0=1  (2.5) 

0,  otherwise. 

The  nearest- neighbor  approximation  in  (2.4)  appears  to  work  well  in  practice.  More 
sophisticated  approximations  result  in  transition  matrices  not  just  having  one  or  zero 
elements,  but  also  fractional  values  in  between  [Abbel,  2012],  This  increases  memory  and 
computational  costs  while,  to  our  experience,  not  providing  much  increase  in  accuracy. 

The  deterministic  state  transition  matrix  corresponds  to  a  state  machine  similar  to 
Figure  2.3.  In  this  figure,  x[n]  G  {xM,xs,xT},  7 [n]  G  {71,72}  and  £[ra]  G  {(1,  (2},  which 
means  there  are  three  states  possible,  two  actions  (low-level  control)  available,  and  two 
random  outcomes.  Note  that  given  what  the  randomness  is,  the  transition  is  deterministic 
in  this  graph. 

To  derive  the  Markov  chain  model  we  then  need  to  determine  a  policy  n,  which  is 
the  high-level  control  picking  the  right  low-level  control  action  at  each  step.  Chapter  5 
is  devoted  to  deriving  optimal  and  robust  policies.  Let  the  decided  policy  for  the  state 
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(Ci,7i)  or  ((2,72) 


Figure  2.3:  A  state  machine  with  three  states  {xm,xTixs)i  two  available  actions 
(Cl;  C2),  and  two  possible  randomness  (71, 72)-  Many  hybrid  systems  can  be  treated  as 
a  finite-state  machine.  In  case  the  dynamics  are  not  discrete  already,  they  need  to  be 
discretized  with  a  Poincare  section.  In  addition,  if  the  randomness  and  state  sets  are 
continuous,  they  should  be  meshed  for  finite  randomness  and  state  sets. 


machine  in  Figure  2.3  be 


77  (x  [n] ,  7  [n] )  =  < 


C2  if  x[n\ 
Ci  if  x[n] 
Ci  if  x[n] 


xM  and  7[n]  =  71, 
xM  and  7  [n]  =  72, 
xt  and  7'[n]  =  7l5 


(2.6) 


(2  if  x[n]  =  xt  and  7(71]  =  72. 

Notice  that  policy  does  not  determine  a  control  action  for  state  x$  since  nothing  can  be 
done  differently  at  an  absorbing  state.  When  £[71]  =  7r(a;[7i],  7(77])  is  used,  (2.2)  becomes 


x[n  +  1]  =  p(x[n],j[n],Tr(x[n],'y[n]))i 


(2.7) 


which  is  a  function  of  the  state  and  randomness  only.  The  result  is  illustrated  in  Fig¬ 
ure  2.4. 

Although  in  this  chapter  we  assume  perfect  state  information  and  randomness  esti¬ 
mation  are  available  to  the  high-level  control,  in  Chapter  5  we  study  the  more  general 
case. 
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72 


Figure  2.4:  A  state  machine  with  three  states  ( xm,xt,%s )  and  two  possible  random¬ 
ness  (71,72),  which  can  be  obtained  from  the  finite-state  machine  of  Figure  2.3  after 
applying  policy  ir  defined  in  (2.6).  Alternatively,  this  figure  is  what  the  model  would 
look  like  if  there  was  a  single  low-level  controller  too.  The  policy  in  that  case  would 
be  using  the  only  available  controller. 

2.3  Stochasticity  for  Markov  Decision  Process 


Given  deterministic  state  transition  matrix,  the  last  step  before  obtaining  a  Markov 
chain  is  to  assume  a  distribution  for  randomness  formulated  as 

Pr(7)  :=  Pr[rf[n]  =  7).  (2.8) 

Then  the  stochastic  state-transition  matrix 

Tij  :=  Pr(x[n  +  1]  =  Xj  \  x[n]  =  Xi),  (2.9) 

which  represents  the  corresponding  Markov  Chain,  can  be  calculated  by 

T  =  ^Pr(7)Td(7,0.  (2.10) 

7er 

In  case  £[n]  =  7r(a;[n], 7[n]),  the  stochastic  matrix  becomes 

r  =  ^JK1)r'(1,i(i,1)|.  (2.11) 

7er 

This  last  equation  will  be  updated  when  we  consider  noise  in  state  information  and 
randomness  estimation  in  Chapter  5  . 
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For  Figure  2.4,  let  Pr(7i)  =  p,  which  means  the  probability  of  y[n]  being  is  p.  The 
result  is  the  Markov  chain  shown  in  Figure  2.5.  More  complicated  systems  end  up  being 
surprisingly  similar  to  this  figure.  Note  that  in  case  p=  1,  only  state-M  observed  if  this 
was  the  initial  state.  So,  the  walking  would  be  stable.  However,  no  matter  how  close  to 
1,  if  p  does  not  equal  to  1,  the  probability  of  moving  to  the  stable  state  goes  to  1  in  time. 


1  —  p 


Figure  2.5:  Markov  chain  obtained  from  the  finite-state  machine  in  Figure  2.4  by 
assuming  Pr(7i)  =  p.  Note  that  State-S  is  absorbing.  Moreover,  State-M  is  metastable 
if  p  is  close  to  but  smaller  than  1.  This  graph  also  looks  like  the  following  coin- flip 
game.  Consider  tossing  an  unfair  coin,  for  which  the  probability  of  having  tails  is  p. 
When  the  number  of  flips  before  two  heads  in  a  row  is  of  interest,  three  states  are 
possible:  (S)  Heads-heads,  (M)  Tails  in  the  last  flip  (including  ‘not-flipped  yet’),  (T) 
Tails-Heads  (including  ‘flipped  once  and  it  was  heads’). 


At  this  point,  it  is  important  to  note  that  when  a  policy  is  not  yet  determined,  but 
stochasticity  is  added,  instead  of  a  Markov  chain,  we  obtain  a  Markov  decision  process. 
The  reason  why  we  preferred  to  add  stochasticity  at  the  last  step  to  obtain  a  Markov 
chain  becomes  clear  in  Chapter  4.  In  summary,  this  order  clarifies  the  following  property. 
By  discretization,  say  the  true  dynamics  of  the  system  are  approximated  as  a  finite-state 
machine  that  is  similar  to  Figure  2.4.  Then,  we  can  easily  and  very  quickly  compute  the 
expected  number  of  steps  before  failure  for  different  stochasticities  in  the  disturbance. 
Also,  we  wanted  to  emphasize  that  the  discretization  process  is  deterministic. 
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2.4  Eigenanalysis  for  Stability  Metric  Development 

Treating  the  system  as  a  Markov  chain  lends  itself  to  a  straightforward  reliability 
measurement.  Let  Ai  and  A2  be  the  largest  two  eigenvalues  of  the  transition  matrix 
associated  with  the  Markov  chain  in  consideration.  Since  all  failure  modes  are  modeled 
to  be  an  absorbing  state,  Ai  =  1  and  the  corresponding  state  distribution,  which  is  called 
stationary  distribution,  has  all  the  probability  mass  at  the  failure  state. 

As  explained  in  detail  in  Appendix  A,  after  taking  several  steps  without  failing,  the 
robot’s  state  distribution  typically  almost  converges  to  metastable  distribution.  Then,  as 
shown  in  Figure  2.6,  the  robot  is  able  to  take  the  next  step  successfully  with  probability 
A2  and  it  fails  otherwise.  This  allows  very  easy  quantification  of  stability  by 

Expected  Number  of  Steps  Before  Failure  =  - — — — .  (2-12) 

1  —  A2 

In  this  calculation  we  also  count  the  step  leading  to  failure.  So,  the  expected  number  of 
steps  before  failure  is  larger  than  or  equal  to  1. 


Figure  2.6:  Metastable  dynamics.  As  explained  in  detail  in  Appendix  A,  after  taking 
several  steps  without  failing,  the  robot’s  state  distribution  typically  almost  converges 
to  metastable  distribution ,  which  is  represented  by  the  green  ball  on  the  left.  The  robot 
is  able  to  take  the  next  step  successfully  with  probability  A2,  which  implies  mapping 
back  to  the  metastable  distribution.  Otherwise  the  robot  transitions  to  failure  state, 
which  is  represented  by  the  red  ball  on  the  right.  Then,  the  stability  of  walking  on 
rough  terrain  can  be  easily  quantified  by  (2.12). 
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2.5  Toy  Example:  Rimless  Wheel 

As  a  toy  example,  in  this  section  we  study  the  system  shown  in  Figure  2.7,  where  the 
slope  may  change  at  each  step1.  After  the  introduction  of  passive  dynamic  walking  in 
[McGeer,  1990]  over  two  decades  ago,  the  rimless  wheel  has  become  very  popular  in  loco¬ 
motion  research  because  this  uncomplicated  walker  keeps  many  of  the  essential  properties 
of  dynamic  walking  robots  [Bhounsule,  2014],  For  simplicity,  we  focus  on  forward  walk¬ 
ing  only  and  assume  the  mass  is  lumped  to  the  center  of  the  robot  and  equals  m=l  kg, 
length  of  each  leg  is  given  by  1=1  m,  number  of  legs  is  8,  gravitational  acceleration  is 
ac=9.81  m/s2,  and  legs  never  slip,  i.e. ,  the  friction  is  always  enough.  For  a  more  general 
and  detailed  approach  to  the  rimless  wheel  compared  to  this  section  refer  to  [Saglam 
et  al.,  2014], 


Figure  2.7:  The  rimless  wheel  as  depicted  in  [Byl  and  Tedrake,  2009].  We  define  for¬ 
ward  direction  to  be  to  the  right  (i.e.,  clockwise).  The  state  for  this  walker  is  two 
dimensional,  consisting  of  the  angle  0  and  velocity  oj  =  6.  Negative  7  values  corre¬ 
spond  to  downhill. 

The  single  support  phase  is  when  only  one  leg  is  in  contact  with  the  ground.  This 
phase  has  continuous  pendulum  dynamics  given  by  6  =  k  sin(0).  The  leg  in  contact  with 
the  ground  is  referred  to  as  the  stance  leg.  On  the  other  hand,  double  support  phase  is 

1See  Appendix  C  for  terrain  modeling. 
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when  two  legs  contact  the  ground,  which  is  well  described  as  an  instantaneous  impact 
event.  The  jump  map  is  obtained  using  conservation  of  momentum  as  u+  =  cos(2a)a;, 
where  u  =  9  is  the  angular  velocity  [McGeer,  1990].  Thus,  the  rimless  wheel  is  a  hybrid 
dynamical  system  which  can  be  expressed  in  the  form  of  (2.1).  Walking  is  single  and 
double  support  phases  following  one  another.  A  step  occurs  between  two  consecutive 
impacts  and  includes  one  of  these  impacts.  In  this  thesis,  we  arbitrarily  define  it  to 
include  the  first  of  the  two  impacts.  Failure  refers  to  not  being  able  to  take  another  step. 

For  the  rimless  wheel  example,  let’s  take  a  Poincare  section  at  6  —  0.  Then,  given 
u>[n\  >  0  and  tu[n  +  1]  >  0,  the  next  velocity  of  the  robot  (when  6  =  0)  is  given  by 

u j[n+  1]  =  \/cos2  (2a)  (u;2[n]  +  2k(1  —  cos(7[n]  +  «)))  —  2k(1  —  cos(y[n]  —  a)),  (2.13) 

which  can  be  easily  verified  using  the  conservation  of  energy  during  the  flows  and  con¬ 
servation  of  angular  momentum  at  the  impacts.  If  the  argument  of  the  square  root  in 
the  equation  above  turns  out  to  be  negative,  then  the  robot  did  not  actually  intersect 
the  Poincare  section  again  which  means  failure.  Note  that  (2.13)  is  a  special  case  of  (2.2) 
with  no  control  action. 

As  mentioned,  the  method  of  this  chapter  requires  finite  and  discrete  slope  and  state 
sets.  For  the  state  set  we  use 

n  =  {w  ;  ^  e  Z,  0  <  w  <  2.5}  ,  (2.14) 

And  the  slope  set  is 

r  =  {7  :  ^  e  z,  0  <  7  <  16  j ,  (2.15) 

Both  sets  can  be  made  denser  for  higher  accuracy  or  coarser  for  faster  computation. 
Because  of  low  dimensionality,  computation  time  is  very  small  (less  than  a  second).  Also, 
the  accuracy  gained  while  calculating  the  expected  number  of  steps  with  denser  sets  is 
negligible. 
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After  determining  the  slope  and  state  sets,  we  calculate  oj[n  +  1]  for  each  u>  G  and 
7  G  T  using  (2.13)  and  form  the  deterministic  state  transition  matrix  as  explained  in 
Section  2.2.  To  obtain  Markov  chain  representation  we  assume  the  slopes  are  normally 
distributed  with  mean  /x7  and  standard  deviation  ay,  i.e.,  y[n]  ~  A/"(/y,  a7).  The  Markov 
chain  obtained  and,  as  a  result,  the  expected  number  of  steps  before  failure  calculated 
using  (2.12)  depends  on  the  distribution  of  slopes  as  depicted  in  Figure  2.8. 


Figure  2.8:  Expected  number  of  steps  before  failure  for  rimless  wheel  as  a  function  of 
stochasticity.  Slopes  ahead  of  the  robot  are  assumed  to  be  normally  distributed  with 
mean  /y,  and  standard  deviation  ey.  When  /r7  is  small  enough  for  constant  rolling, 
smaller  cr7  implies  higher  expected  number  of  steps  before  failure. 
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Discretization  of  High-Dimensional 
States 

In  applying  the  tool  of  Chapter  2,  one  of  the  steps  was  meshing  the  state  space  to  represent 
it  with  finitely  many  elements,  e.g.,  {0, 77C-,  1}  is  a  uniform  discretization  of  [0, 1] 

with  k  elements.  However,  uniformly  meshing  the  entire  state  space  becomes  intractable 
for  high  dimensional  systems.  Thus,  as  mentioned  in  Section  2.2,  there  is  potentially 
a  curse  of  dimensionality  for  high  degrees-of-freedom  robots.  However,  if  the  intrinsic 
dimension  of  the  reachable  state  space  is  low,  meshing  can  still  be  achieved  for  systems 
with  high-dimensional  states  as  explained  in  this  chapter. 

3.1  Avoiding  the  Curse  of  Dimensionality 

The  critical  observation  toward  avoiding  the  curse  of  dimensionality  is  the  fact  that 
only  the  reachable  state  space  under  the  given  control  law  needs  to  be  meshed  instead  of 
the  entire  state  space.  Let  us  illustrate  with  the  simple  biped  model  shown  in  Figure  3.1. 
This  walker  has  two  massless  legs  and  one  actuator  which  controls  the  inter-leg  angle. 
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Assuming  second  order  dynamics,  the  state  space  is  4  dimensional  for  this  walker,  con¬ 
sisting  of  2  angles  and  2  velocities.  To  begin  with,  assume  that  immediately  after  each 
impact,  the  controller  ensures  q  —  2a  and  q  =  0.  In  this  case,  this  walker  is  identical  to 
the  rimless  wheel  we  studied  in  Section  2.5  where  2a  corresponds  to  the  fixed  inter-leg 
angle.  Note  that  although  this  simple  biped  has  a  4D  state,  the  reachable  state  space 
under  the  mentioned  control  law  is  only  2  dimensional. 


Figure  3.1:  Illustration  of  a  very  simple  biped  model.  The  legs  are  massless  and  the 
only  actuator  controls  the  inter-leg  angle  q.  Assuming  that  immediately  after  each 
impact  the  controller  ensures  q  =  2a  and  q  =  0,  then  this  biped  behaves  like  the 
rimless  wheel  in  Figure  2.7. 

In  reality,  the  controller  may  not  be  able  to  reach  its  references  before  every  impact. 
To  maximize  the  allowed  time  for  converge,  we  take  the  Poincare  section  just  before 
impacts  instead  of  6  =  0  as  in  Section  2.5.  However,  even  after  this  choice,  the  controller 
may  not  be  able  to  perfectly  convergence  in  some  cases,  so  the  reachable  state  space 
typically  turns  out  to  be  a  quasi- 2D  manifold.  Note  that  the  rimless  wheel  has  1  degree- 
of- freedom  (DOF)  but  no  actuators,  and  the  massless  legged  biped  has  2  DOF  and  1 
actuator.  So,  both  these  walkers  are  underactuated  by  1  DOF.  Similarly,  the  5-link  biped 
modeled  in  Appendix  B  is  also  underactuated  by  1  DOF,  because  it  has  5  links  and  4 
actuators.  As  a  result,  the  reachable  state  space  under  a  given  control  law  turns  out  to 
be  a  quasi-2D  manifold  for  the  5-link  biped  also  [Saglam  and  Byl,  2013a]. 
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We  note  that  the  reachable  state  space  being  a  lower  dimensional  manifold  is  not  an 
intrinsic  requirement  for  our  meshing  technique,  which  is  presented  next.  This  section 
is  rather  an  explanation  why  the  following  method  does  not  explode  and  instead  stays 
tractable  when  applied  to  high  DOF  robots. 

3.2  Explored  Meshing 

Determining  the  state  set  is  difficult  because  we  are  studying  high  dimensional  sys¬ 
tems,  e.g.,  the  5-link  walker  has  a  10  dimensional  state  (i.e.,  positions  and  velocities). 
So,  it  is  not  feasible  to  uniformly  and  densely  cover  a  hypercube  that  includes  the  reach¬ 
able  state  space.  However,  meshing  the  reachable  state  space  can  be  achieved  by  starting 
from  an  initial  state  set  Xt  with  very  small  number  of  points  (one  “good”  state,  such  as 
the  fixed  point  under  mean  disturbance,  is  enough)  and  deterministically  expanding  by 
iteratively  simulating. 

Our  algorithm  works  as  follows.  We  initially  start  by  setting  X  =  {x  £  Xt  :  x  yf  aq}1, 
which  corresponds  to  all  non-failure  states  that  are  not  simulated  yet.  Then  we  start  the 
following  iteration:  As  long  as  there  is  a  state  x  £  X,  simulate  to  find  all  possible  p(x,  7,  () 
and  remove  x  from  X.  For  the  points  newly  found,  check  their  distance  to  the  other  states 
in  X.  If  the  distance  is  larger  than  some  threshold  dthr,  be.,  the  point  is  far  enough  from 
all  existing  mesh  points,  then  add  that  point  to  X  and  X.  We  call  the  mesh  obtained  by 
this  method  explored-mesh  and  present  the  pseudocode  in  Algorithm  1. 

Using  the  right  distance  metric  is  crucial  in  ensuring  that  A"  has  a  small  number 
of  states  while  accurately  covering  the  reachable  state  space.  Standardized  (normalized) 
Euclidean  distance  turned  out  to  be  extremely  useful  as  it  dynamically  adjusts  the  weights 

for  each  dimension  according  to  its  standard  deviation  at  any  mesh  iteration.  The  distance 

1As  explained  in  Appendix  A,  without  loss  of  generality  x\  represents  all  the  failures  no  matter  how 
robot  failed. 
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Algorithm  1  Explored  meshing  algorithm 

Input:  Initial  set  of  states  Xi:  Randomness  set  T ,  Controller  set  Z,  and  threshold  dis¬ 
tance  dfhr 

Output:  Final  set  of  states  X,  and  state-transition  map 
1:  X  t—  Xi  (except  x\  ) 

2:  X  <r-  Xi 

3:  while  X  is  non-empty  do 
4:  X2  <-  X 

5:  empty  X 

6:  for  each  state  x  E  X2  do 

7:  for  each  slope  7  G  T  do 

8:  for  each  controller  (  G  Z  do 

9:  Simulate  a  single  step  to  get  the  final  state  x  when  initial  state  is  x, 

slope  ahead  is  7,  and  controller  (  is  used  (Calculate  x  =  p(x,  7,  £)). 
Store  this  information  in  the  state-transition  map 
10:  if  robot  did  not  fall  (x  ^  X\)  and  d{x,  X)  >  dthr  then 

11:  add  x  to  X 

12:  add  x  to  A" 

13:  end  if 

14:  end  for 

15:  end  for 

16:  end  for 

17:  end  while 
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of  a  vector  x  from  X  is  calculated  as 


-)  J,  (3.1) 

where  rt  is  the  standard  deviation  of  i’th  dimension  of  all  existing  points  in  set  X.  In 
addition,  the  closest  point  in  X  to  x  is  given  by 

c(x,X)  :=  argmin  / ^  ^  1  .  (3.2) 

Our  algorithm  allows  us  to  increase  the  accuracy  of  the  final  mesh  at  the  expense  of 
producing  a  higher  number  of  states  (larger  X)  by  decreasing  threshold  distance  dthr  for 
states  and  discretization  length  d7  used  for  randomness  set  in  (2.3). 

While  meshing  the  whole  10D  state  space  for  the  5-link  walker  we  study  in  this  thesis 
is  infeasible,  this  method  is  able  to  avoid  the  curse  of  dimensionality  because  the  reachable 
state  space  is  actually  a  quasi-2D  manifold  as  explained  in  the  first  section. 

For  the  rimless  wheel,  Figure  3.2  shows  the  explored  mesh  obtained  using  dthr  —  0.025 
and  dry  =  0.5.  Points  in  this  figure  correspond  to  possible  states  just  before  impacts.  The 
probability  of  being  at  each  state  is  shown  with  color.  Convex  hulls  of  states  that  cover 
the  0.9,  0.99,  0.999,  and  0.9999  of  the  state  probability  distribution  are  also  drawn. 

In  Section  2.5  the  state  mesh  was  only  1  dimensional  because  we  took  a  Poincare 
section  at  6  =  0.  As  we  focus  on  the  states  just  before  the  impact  in  this  chapter,  the 
reachable  state  space  is  rather  2  dimensional.  However,  we  obtain  similar  results  using 
both  methods.  In  particular,  let’s  consider  uniformly  distributed  slope  changes  with  mean 
/i7  =  7°  and  standard  deviation  er7  =  1°  for  the  rimless  wheel.  Under  these  circumstances, 
we  calculated  the  expected  number  of  steps  to  be  1.79  x  10'  and  2.1  x  10'  in  the  previous 
chapter  and  using  explored  mesh  of  Figure  3.2  respectively. 


d(x,  X)  :=  min  <j  ^ 
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Figure  3.2:  Explored  meshing  method  applied  on  the  rimless  wheel  in  Figure  2.7. 
Threshold  distance  dthr  =  0.025  and  slope  set  discretization  length  d7  =  0.5  were 
used  to  obtain  a  mesh  with  252  states  shown  in  this  figure.  Colored  bar  depicts  the 
probability  of  being  at  a  state  at  the  end  of  a  step  while  walking.  Convex  hulls  cover 
0.9,  0.99,  0.999,  and  0.9999  of  the  state  probability  distribution.  The  expected  number 
of  steps  is  calculated  to  be  2.1  x  107  using  this  mesh. 


3.2.1  Two  Tricks 


Two  important  tricks  to  make  the  explored  meshing  algorithm  run  faster  are  as  fol¬ 
lows.  First,  the  randomness  set  allows  a  natural  cluster  analysis.  The  distance  comparison 
for  a  new  point  can  be  made  only  with  the  points  that  are  associated  with  the  same  slope. 
This  might  result  in  more  points  in  the  final  mesh,  but  it  speeds  up  the  meshing  and 
later  calculations  significantly.  Secondly,  fix  a  controller  (  and  a  state  x.  Then  we  can 
simulate  p(x,  min(T),  ()  and  then  potentially  extract  p(x,  7,  ()  for  all  7  6  T.  To  illustrate, 
in  order  for  the  robot  to  experience  an  impact  at  —30  degree,  it  has  to  pass  through  all 
the  possible  impact  points  with  higher  degrees,  and  we  can  extract  all  impact  cases  from 
a  single  simulation. 
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Low-Level  Control  Action 


Performance  of  highly-dynamic  walkers  is  often  quantified  by  speed  or  energy  consump¬ 
tion  under  deterministic  conditions  [Hobbelcn  and  Wisse,  2007].  However,  legged  robots 
need  “good”  disturbance  rejection  to  operate  reliably  in  real-world  environments,  and 
achieving  this  goal  requires  quantifying  robustness.  Out  of  the  small  number  of  metrics 
proposed  for  measuring  robustness  of  dynamic  walkers,  the  L2  gain  calculation  in  [Dai 
and  Tedrake,  2013]  was  successfully  extended  and  implemented  on  a  real  robot  in  [Grif¬ 
fin  and  Grizzle,  2015].  Alternatively,  largest  terrain  disturbance  was  maximized  in  [Pratt 
et  al.,  2001]  and  trajectories  were  optimized  to  replicate  human-walking  data  in  [Ames, 
2012],  Instead,  this  thesis  adopts  the  metric  explained  in  Chapter  2,  which  is  used  to  opti¬ 
mize  and  benchmark  control  action  in  this  chapter.  We  argue  that  this  approach  provides 
better  performance  on  rough  terrain  and  it  is  more  powerful  for  benchmarking  purposes. 
Mostly,  we  study  the  particular  control  strategy  formulated  in  Appendix  D  as  a  case 
demonstration.  However,  two  other  control  schemes  are  also  optimized  for  benchmarking 
purposes.  The  first  one  is  the  now-familiar  hybrid  zero  dynamics  approach  [Westervelt 
et  ah,  2003]  and  the  other  is  a  method  using  piece-wise  reference  trajectories  with  a 
sliding  mode  control  [Saglam  and  Byl,  2013b]. 
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For  optimization,  we  employ  the  “fminsearch”  in  MATLAB®,  which  uses  the  derivative- 
free  method  proposed  by  [Lagarias  et  al.,  1998]  to  find  the  unconstrained  minimum.  In 
case  we  impose  constraints,  we  use  the  extended  version  provided  by  [Oldenhuis,  2009] 
instead. 


4.1  Optimization  of  a  Given  Controller 


As  mentioned,  control  action  is  often  designed  to  minimize  energy  consumption  or 
maximize  speed.  For  the  first  one,  Cost  of  Transport  (COT)  gives  the  non-dimensionalized 
energy  expenditure  per  unit  weight  and  unit  distance  [Tucker,  1975].  It  is  defined  as  the 
total  mechanical  energy  spent,  divided  by  weight  times  distance,  i.e., 


COT  = 


W 

rrigd ’ 


(4.1) 


where  m  is  the  mass,  g  is  the  gravitational  constant,  and  d  is  the  distance  traveled.  In 
this  paper  we  use  the  conservative  definition  of  “energy  spent”  by  regarding  negative 
work  is  also  done  by  the  robot,  i.e. 


II  |  WpOSit ive  |  T  |  Wnegcitive  \ 


w, 


positive 


w. 


negative 


(4.2) 


so  that  both  acceleration  and  breaking  require  power,  but  there  is  no  regenerative  break¬ 
ing.  Compared  to  energy  efficiency,  measuring  speed  under  deterministic  conditions  is 
even  more  straightforward. 

To  evaluate  the  stability  obtained  by  optimizing  for  these  metrics,  we  consider  nor¬ 
mally  changing  the  slopes  in  front  of  the  robot  as  in  Figure  C.la.  We  fix  the  long-term 
mean  slope  to  be  /r7  =  0  and  vary  the  standard  deviation  <r7  to  calculate  resulting 
expected  number  of  steps  before  failure  depicted  in  Figure  4.1.  The  results  reveal  that 
optimizing  for  these  metrics  results  in  sensitivity  to  perturbations,  thus  performing  poorly 
on  rough  terrain  as  expected. 
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On  the  other  hand,  now  we  have  a  mathematical  formulation  for  the  expected  number 
of  steps  before  failure,  we  can  finally  optimize  for  it  using 


maximize  {the  number  of  steps}  =  maximize  < - —  >  ,  (4-3) 

controller  controller  II  —  A  2  I 

parameters  parameters  v  J 


where  the  right  hand  side  comes  from  (2.12).  To  calculate  this  number  we  assume  slopes 
are  normally  distributed  with  zero  mean  and  standard  deviation  cr7  =  5°.  Noting  that 
y-axis  is  in  logarithmic  scale,  Figure  4.1  demonstrates  that  optimizing  for  stability  using 


(4.3)  provides  extreme  improvements  in  rough  terrain  walking  performance.  For  cr7  =  5°, 
while  the  other  two  are  expected  to  take  less  than  1.5  steps,  controller  optimized  for 


stability  is  expected  to  take  around  10  thousand  steps.  The  difference  is  even  more 


dramatic  for  smaller  but  nonzero  terrain  roughness. 


Cl) 


Figure  4.1:  Optimization  of  low-level  control  action  using  different  metrics.  Noting 
that  the  y-axis  is  in  logarithmic  scale,  figure  reveals  that  optimizing  for  stability 
using  (4.3)  provides  extreme  improvements  in  rough  terrain  walking  performance. 
This  figure  shows  expected  number  of  steps  before  falling  calculated  using  (2.12) 
versus  <r7.  Slopes  ahead  of  the  robot  are  assumed  to  be  normally  distributed  with 
/u  =  0.  Numerical  limit  at  1014  is  due  to  machine  precision,  and  it  corresponds  to  a 
million  tours  around  the  world  for  a  human-sized  robot  with  half  a  meter  step  length. 
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Note  that  the  number  of  points  in  the  final  mesh,  and  the  accuracy  obtained  from 
(2.12)  with  this  mesh,  are  inversely  related  to  parameters  dt}ir  and  d7,  where  dthr  is  the 
threshold  distance  in  Algorithm  1  and  d7  is  the  discretization  length  in  (2.3).  We  claim 
that,  as  dry  — »  0  and  dthr  ->  0,  the  mesh  captures  the  true,  hybrid  dynamic  system 
dynamics,  and  as  a  result,  the  expected  number  of  steps  before  failure  for  the  controllers, 
converges.  To  test  the  accuracy  of  our  calculations,  we  evaluate  the  performance  of  the  top 
controller  in  Figure  4.1  using  (dthr,  d7)  G  {(2,  2.5),  (1, 1),  (0.5, 0.5)}.  The  fine  mesh,  which 
is  obtained  using  ( dthr,d-y )  =  (0.5,  0.5),  has  77,329  states,  and  the  coarse  mesh,  which  is 
a  result  of  using  (dthr,  d7)  =  (2,  2.5),  consists  of  only  372  states.  Figure  4.2  demonstrates 
the  convergence  of  performance  quantification  with  indistinguishable  curves. 


Figure  4.2:  Verification  of  performance  quantification.  The  fine  mesh  is  obtained  using 
(i dthr,d 7)  =  (0.5,  0.5),  and  the  coarse  mesh  is  a  result  of  using  (dthr,d1)  =  (2,2.5). 
Closely  matching  curves  and  data  points  indicate  our  performance  quantification  is 
accurate.  This  figure  shows  expected  number  of  steps  before  falling  calculated  using 
(2.12)  versus  <r7.  Slopes  ahead  of  the  robot  are  assumed  to  be  normally  distributed 
with  fij  =  0.  Numerical  limit  at  1014  is  due  to  machine  precision,  and  it  corresponds 
to  a  million  tours  around  the  world  for  a  human-sized  robot  with  half  a  meter  step 
length. 
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For  further  verification  of  our  results,  we  also  carry  Monte  Carlo  simulations  for  a  € 
{7°,  8°,  9°,  10°}.  For  each  case  we  simulate  104  times  until  failure.  The  average  number  of 
steps  are  also  provided  in  Figure  4.2.  Closely  matching  results  validates  our  approach  once 
again.  At  this  point  we  would  like  to  clarify  some  clear  advantages  of  our  methodology 
over  Monte  Carlo  simulations.  While  obtaining  a  single  point  in  Figure  4.2  required 
“104x  average  number  of  steps”  simulations  for  Monte  Carlo  method,  all  of  the  curve 
associated  with  the  coarse  mesh  was  obtained  by  372  simulations  only!  So,  with  far  fewer 
simulations  we  were  able  to  create  a  whole  curve  instead  of  getting  a  single  data  point. 
This  is  because  once  the  system  is  discretized,  addition  of  stochasticity  and  calculation 
of  A2  are  fast  operations.  Note  that  Monte  Carlo  simulations  become  quickly  intractable 
as  the  number  of  steps  increases,  whereas  the  numerical  limit  in  our  method  is  very  high 
and  can  be  enlarged  by  increasing  the  machine  precision  when  necessary.  Furthermore, 
calculating  the  curve  is  just  one  of  the  outcomes  in  our  methodology.  This  complete  tool, 
for  example,  also  provides  invaluable  information  for  high-level  control  design  as  we  will 
see  in  the  next  chapter. 

[Benallegue  and  Laumond,  2013]  presents  a  method  based  on  Monte  Carlo  simulations 
to  compute  the  expected  number  of  steps  before  failure  when  it  is  very  high.  The  method 
relies  on  the  property  that  for  a  fixed  controller  dynamic  walkers  return  very  close  to 
their  limit-cycle  over  and  over  again  as  they  keep  walking.  In  a  future  work,  this  method 
will  be  used  to  verify  the  curves  in  Figure  4.2  also  for  low  cy  values. 

4.2  Benchmarking  Controller  Schemes 

In  addition  to  optimization  capabilities,  benchmarking  is  a  powerful  use  of  our  math¬ 
ematical  tool,  which  we  already  employed  to  present  Figure  4.1  and  4.2.  In  this  section, 
we  look  at  how  different  controller  schemes  optimized  for  stability  compare.  The  first  one 
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is  the  now-familiar  hybrid  zero  dynamics  (HZD)  approach  [Westervelt  et  ah,  2003].  The 
second  is  a  method  using  piece-wise  reference  trajectories  with  a  sliding  mode  control 
(PC-SMC)  [Saglam  and  Byl,  2013b].  Finally,  extended  hybrid  zero  dynamics  (EHZD) 
controller  is  the  scheme  adopted  throughout  this  paper,  which  is  explained  in  detail  in 
Appendix  D.  We  optimize  all  three  controller  schemes  using  (4.3).  Figure  4.3  demon¬ 
strates  EHZD  is  a  more  capable  controller  algorithm  compared  to  the  other  two.  As  a 
future  work,  we  are  extremely  interested  in  adding  more  controller  schemes  to  this  figure, 
such  as  time- varying  controllers,  ZMP-based  algorithms  (when  applicable),  and  central 
pattern  generators  [Brown,  1914],  [Kuo,  2002].  We  also  encourage  other  researchers  to  use 
our  methods  to  demonstrate  capabilities  of  the  controllers  they  designed  or  optimized. 


Cl) 


Figure  4.3:  Benchmarking  various  controller  schemes.  Extended  hybrid  zero  dynamics 
controller  turns  out  to  be  more  capable  in  rough  terrain  walking.  However,  the  goal  of 
this  figure  is  to  show  benchmarking  capability  of  our  mathematical  tool.  This  figure 
shows  expected  number  of  steps  before  falling  calculated  using  (2.12)  versus  cj7.  Slopes 
ahead  of  the  robot  are  assumed  to  be  normally  distributed  with  /i7  =  0.  Numerical 
limit  at  1014  is  due  to  machine  precision,  and  it  corresponds  to  a  million  tours  around 
the  world  for  a  human-sized  robot  with  half  a  meter  step  length. 
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4.3  Comparison  to  Other  Work 


We  finally  compare  our  results  with  the  state-of-the-art.  Like  this  thesis,  [Dai  and 
Tedrake,  2013]  adopts  the  RABBIT  robot  model  provided  in  Appendix  B.  However, 
instead  of  using  normal  distributions  as  we  have  done  so  far,  they  uniformly  distribute 
the  slopes  ahead  of  the  robot.  Then,  they  optimize  a  time- varying  control  structure  for 
the  L2  gain  to  minimize  deviations  due  to  ground  variations.  To  test  their  resulting 
controller,  they  assume  the  slopes  ahead  of  the  robot  is  normally  distributed  between 
-2  and  +2  degrees  and  simulate  40  times  until  failure.  The  robot  takes  20.325  steps 
on  average  as  depicted  in  Figure  4.4,  where  our  controller  greatly  outperforms  in  terms 
of  stability.  Note  that  to  obtain  the  curve  for  our  controller,  we  did  not  discretize  the 


Figure  4.4:  First  comparison  with  the  state-of-the-art.  Our  controller  outperforms 
the  [Dai  and  Tedrake,  2013]  and  we  were  able  to  present  our  results  with  more  data 
points  using  fewer  simulations.  This  figure  shows  expected  number  of  steps  before 
falling  calculated  using  (2.12)  versus  a.  Slopes  ahead  of  the  robot  are  assumed  to  be 
uniformly  distributed  between  ±a  degrees.  Numerical  limit  at  1014  is  due  to  machine 
precision,  and  it  corresponds  to  a  million  tours  around  the  world  for  a  human-sized 
robot  with  half  a  meter  step  length. 
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dynamics  again.  We  simply  used  our  discretization  from  the  previous  section  and  added 
a  different  form  of  stochasticity.  The  computation  took  only  several  seconds. 

We  would  like  to  explain  why  there  is  a  sudden  fall  in  expected  number  of  steps  around 
<r7  =  14°  in  Figure  4.4  unlike  the  previous  2  figures,  where  we  used  normal  distributions. 
Say  the  slopes  are  normally  distributed  between  ±a  degrees.  Imagine  a  robot  which  can 
walk  in  a  stable  manner  when  a  =  14  degrees.  Assume  the  robot  falls  in  two  steps  beyond 
this  limit.  Now,  if  we  look  at  a  =  12.5°,  then  we  should  expect  to  hit  the  numerical  limit 
in  Figure  4.4.  However,  if  a  =  15°,  then  there  is  around  1/152  ~  0.00444  chance  the  robot 
will  fail  after  the  following  two  steps,  which  corresponds  to  only  225  steps  on  average. 
On  the  other  hand,  when  we  employ  normal  distributions  we  consider  a  very  wide  range 
of  slopes  possible  with  finite  probabilities  for  all,  which  might  be  extremely  small  at  the 
tails.  But  this  gives  us  smooth  transitions  and  allows  calculating  any  a7  value,  whereas 
calculations  for  a  values  outside  slope  set  are  not  always  trustable.  That’s  why  Figure  4.4 
only  uses  the  data  points  shown  to  form  the  curve,  while  the  previous  two  figures  uses  a 
much  denser  standard  deviation  set,  so  each  data  point  are  not  shown  to  avoid  clutter. 

In  addition  to  Figure  4.4,  we  also  followed  the  method  in  [Dai  and  Tedrake,  2013] 
and  simulated  40  times  until  failure  independent  from  the  discretization  process.  Instead 
of  limiting  slopes  between  -2  and  +2  degrees,  where  our  robot  seems  unlikely  to  fall,  we 
used  -17.5  and  +17.5  degrees  to  take  77.525  steps  on  average  as  listed  in  Table  4.1. 


Table  4.1:  First  comparison  with  the  state-of-the-art.  Slopes  ahead  of  the  robot  are 
changing  uniformly.  Simulations  are  carried  40  times  until  failure. 


Terrain 

Average 

Roughness 

Number  of  Steps 

Hongkai  Dai  and  Russ  Tedrake,  2013 

[—2°,  2°] 

20.325 

Cenk  Oguz  Saglam  and  Katie  Byl,  2015 

[-17.5°,  17.5°] 

77.525 
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In  another  state-of-the-art  work,  [Griffin  and  Grizzle,  2015]  again  uses  uniform  dis¬ 
tribution,  but  this  time  instead  of  slopes  ahead  of  the  robot,  step  heights  change  as 
illustrated  in  Figure  C.lb.  They  simulate  50  times  until  failure,  but  they  limit  each  it¬ 
eration  to  104  steps  to  reduce  computational  cost.  They  report  that  their  controller  was 
able  finish  4  of  the  50  trials  when  step  heights  are  uniformly  distributed  between  ±4  cm. 
On  average  the  robot  takes  4,616  as  shown  in  Figure  4.5.  In  comparison,  although  our 


Figure  4.5:  Second  comparison  with  the  state-of-the-art.  Our  controller  also  outper¬ 
forms  the  [Griffin  and  Grizzle,  2015]  and  we  were  able  to  present  our  results  with  more 
data  points  using  fewer  simulations.  This  figure  shows  expected  number  of  steps  before 
falling  calculated  using  (2.12)  versus  a.  Step  heights  ahead  of  the  robot  are  assumed 
to  be  uniformly  distributed  between  ±a.  Numerical  limit  at  1014  is  due  to  machine 
precision,  and  it  corresponds  to  a  million  tours  around  the  world  for  a  human-sized 
robot  with  half  a  meter  step  length. 


controller  was  optimized  for  a  normally  distributed  sloped  terrain,  it  outperforms  on  a 

uniformly  distributed  step  terrain  as  shown  in  Figure  4.5.  Independently  from  this  figure, 

we  also  simulated  50  times  when  step  heights  are  uniformly  distributed  between  ±4  cm, 

and  our  robot  was  able  to  complete  all  trials  as  listed  in  Table  4.2.  We  should  also  note 

that  while  we  and  [Dai  and  Tedrake,  2013]  are  working  on  RABBIT  shown  in  Figure  1.4c, 
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[Griffin  and  Grizzle,  2015]  use  a  model  for  ATRIAS  pictured  in  Figure  1.4e. 


Table  4.2:  Second  comparison  with  the  state-of-the-art.  Step  heights  ahead  of  the  robot 
are  changing  uniformly.  Simulations  are  carried  50  times  until  failure  or  reaching  the 
limit  at  104  steps. 


Terrain 

Finished 

Roughness 

Trials 

Brent  Griffin  and  Jessy  Grizzle,  2015 

[—4, 4]  cm 

4/50 

Cenk  Oguz  Saglarn  and  Katie  Byl,  2015 

[—4, 4]  cm 

50/50 

4.4  Incorporating  Secondary  Metrics 

Finally,  once  stability  is  quantified,  it  is  straightforward  to  incorporate  other  met¬ 
rics.  To  illustrate,  the  controller  parameters  obtained  using  (4.3)  turns  out  to  produce  a 
relatively  slow  walking  motion  as  shown  in  Table  4.3.  However,  if  we  allow  extra  room 
for  the  optimization  to  also  consider  other  metrics,  we  obtain  walking  gaits  that  are 
faster  and  more  energy  efficient  as  the  table  demonstrates.  The  cost  function  given  by 
2  x  COT— speed  was  empirically  chosen  and  the  potential  trade-off  can  be  easily  adjusted 
depending  on  the  application. 

The  fact  that  optimal  controller  depends  on  what  the  objective  (or  cost)  function  is 
motivates  adopting  the  hierarchical  control  structure  of  the  next  chapter.  In  particular, 
section  5.3  studies  the  advantages  of  switching  between  the  two  controllers  listed  in 
Table  4.3. 
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Table  4.3:  Incorporating  secondary  metrics  on  low-level  control  design.  The  second 
column  gives  the  expected  number  of  steps  before  failure  when  slopes  are  normally 
distributed  with  zero  mean  and  standard  deviation  equals  5  degrees. 


Expected  Number  of  Steps 

Speed 

COT 

.  .  /  1  1 

maximize  < - —  > 

controller  11  —  A  2  1 
parameters  v  J 

«  104 

0.462 

0.212 

minimize  {2  x  COT  —  speed} 

controller 

parameters 

«  102 

0.7 

0.185 

subject  to  - - —  >  102 

1  —  A2 
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What  is  very  intuitive  but  has  lacked  sufficient  attention  in  the  robotics  community 
is  that  humans  do  not  walk  the  same  way  on  every  ground  type.  They  modify  their 
walk  depending  on  various  conditions,  such  as  whether  the  surface  is  pavement  or  clay, 
whether  the  ground  is  triangular  (slopes),  or  rectangular  (stairs),  whether  it’s  uphill 
or  downhill,  and  whether  there  are  obstacles  on  the  way,  just  to  name  some  terrain 
features.  It  is  not  possible  or  necessary  to  design  a  specific  controller  for  each  specific 
case.  However,  if  we  have  multiple  controllers  (potentially  designed  with  different  general 
cases  in  mind)  available,  the  robot  may  increase  stability  by  appropriately  switching 
among  controllers.  While  switching  using  only  internal  (proprioceptive)  state  information 
(blind- walking)  is  advantageous  [Park  et  ah,  2012],  [Saglam  and  Byl,  2014c],  a  dramatic 
improvement  is  obtained  with  use  of  upcoming  terrain  information  [Saglam  and  Byl, 
2013b],  Instead,  work  to  date  has  typically  focused  either  on  remaining  robust  when  blind 
to  upcoming  terrain  [Raibert  et  ah,  2008],  [Park  et  ah,  2013]  or  on  achieving  particular 
footstep  lengths  [Hodgins  and  Raibert,  1991],  without  more  generally  addressing  the  issue 
of  planning  on  partly-known  terrain.  In  this  chapter  we  propose  a  systematic  way  to  solve 
this  problem. 
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Figure  5.1  displays  the  hierarchical  control  structure  adopted  in  this  thesis,  which 
features  a  high-level  control  picking  one  of  the  low-level  controllers  at  each  step.  As 
expressed  in  Chapter  2  when  there  are  multiple  low-level  controllers  available  to  the 
robot,  discretization  yields  a  Markov  decision  process,  which  is  solved  to  obtain  a  Markov 
chain  and  to  quantify  reliability.  In  the  general  form,  policies  (high-level  controllers) 
are  functions  of  the  state  estimation  and  noisy  terrain  information.  Simpler  policies  are 
special  cases  in  this  setting.  The  goal  is  to  obtain  robust,  near-optimal  control  policies 
for  low-level  controllers  using  dynamic  programming  tools  [Bellman,  1957],  [Saglam  and 
Byl,  2014c], 


High-Level 

Controller 


Low-Level 

Controllers 


C- 

10 

C- 

-5 

r 

□ 

C5 


C10 


Figure  5.1:  Hierarchical  control  structure.  At  each  step,  the  high-level  controller  picks 
a  low-level  controller  to  use  given  state  estimation  and  optional  noisy  terrain  informa¬ 
tion.  In  this  figure  there  are  five  low-level  controllers  available.  Ci  denotes  the  low-level 
control  optimized  for  a  normally  distributed  terrain  with  mean  /i7  =  i  degrees  and 
standard  deviation  <r7  =  5°  using  Chapter  4’s  method. 
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Let  Q  denote  the  low-level  control  optimized  for  a  normally  distributed  terrain  with 
mean  /i7  =  i  degrees  and  standard  deviation  ay  =  5°  using  Chapter  4’s  method.  To  begin 
with,  we  use  the  controller  set  given  by  Z  =  {C-io,  C-5,  Co,  Cs,  Cio}  and  discretize  the 
dynamics  using  Algorithm  1  with  dtivr  =  1  and  (Lf  —  1°.  The  resulting  mesh  has  59,804 
states. 


5.1  Optimal  Policies 

In  this  section  we  look  at  the  benefits  of  using  environment  and  state  information  to 
switch  between  low-level  controllers.  Initially  we  assume  none  of  these  are  available  to 
the  robot,  so  the  policy  is  to  use  one  of  the  fixed  controllers.  Then  we  investigate  the 
improvements  gained  by  using  slope  or  state  information  only.  In  these  cases  the  robot 
is  said  to  do  visual  or  blind  walking  respectively.  On  the  other  hand,  in  general  a  policy 
is  a  function  of  both  slope  and  state  estimation,  so  the  policy  is  given  by 

C[n]  =7r(x[n],7[n]),  (5.1) 

as  used  in  Chapter  2.  We  refer  to  this  choice  as  sighted  walking. 

5.1.1  Fixed  Controllers 

The  simplest  policy  is  to  use  just  one  controller,  i.e., 

CM  =  0  (5.2) 

for  fixed  i.  In  this  case  T  is  calculated  as  in  (2.10)  for  a  given  /r7  and  ay.  We  then  derive 

the  expected  number  of  steps  before  failure  using  (2.12).  To  evaluate  the  performance  of 

the  individual  controllers  in  Z  depending  on  the  long-term  mean  slope,  we  fix  ay  to  be 

5  degrees.  The  results  are  demonstrated  in  Figure  5.2,  where  the  x-axis  is  /r7,  and  the 

y-axis  shows  the  expected  number  of  steps  before  failure. 
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We  note  that  the  optimal  choice  for  using  one  of  the  fixed  controllers  in  Z  depends 
on  the  long  term  mean  slope.  If  the  mean  slope  is  close  to  0  degrees,  the  fixed  controller 
policy  needs  to  be  using  (0  for  optimality. 
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Figure  5.2:  Performance  of  individual  low-level  controllers.  Let  (i  denote  the  low-level 
control  optimized  for  a  normally  distributed  terrain  with  mean  /i7  =  i  degrees  and 
standard  deviation  ay  =  5°  using  Chapter  4’s  method.  The  controllers  shown  in  this 
picture  are  C-iO)  C-5)  Co ?  and  (to  from  left  to  right.  The  expected  numbers  of  steps 
before  falling  are  calculated  using  (2.12)  versus  //7.  Slopes  ahead  of  the  robot  are 
assumed  to  be  normally  distributed  with  oy  =  5  deg. 
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5.1.2  Visual  Walking 

After  seeing  that  the  performance  of  the  individual  controllers  depends  significantly 
on  the  mean  slope  ahead,  we  consider  policies  using  only  one-step  lookahead  (yfn])  in¬ 
formation,  which  are  in  the  form  of 


CH  =  w(7[n]). 


(5.3) 


A  trivial  idea  is  to  look  at  the  slope  ahead  and  using  the  controller  which  was  opti¬ 
mized  for  the  nearest  slope.  In  other  words,  Ti{^[n\)  =  Q  and  i  =  c(y,  {—10,  —5,  0,  5, 10}), 
where  function  c  is  as  defined  in  (3.2).  To  illustrate,  we  use  controller  £0  when  —2.5°  < 
7  <  2.5°.  We  present  the  performance  of  this  trivial  policy  in  Figure  5.3.  It  is  rather  sur¬ 
prising  to  see  how  bad  this  intuitive  policy  turns  out  to  be.  To  improve  the  performance, 
we  used  genetic  algorithm  to  obtain  the  visual  policy  given  by 


vr(7[n])  =  { 


■T", 

o 

C-5, 

—28.5' 

Co, 

-17.5' 

C5, 

24.5' 

C10, 

29.5' 

7  <  -28.5° 

<  7  <  -17.5° 

<  7  <  24.5° 

<  7  <  29.5° 

<  7, 


(5.4) 


which  performs  better  when  yU7  =  0. 

Although  the  improvements  gained  by  using  slope  estimation  seem  to  be  small  in 
this  specific  case,  we  later  discover  in  the  following  sections  that  single-step  lookahead  to 
terrain  is  very  useful  when  used  together  with  state  information. 
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Mean  Slope  of  the  Terrain  Ahead  (p7)  (degrees) 

Figure  5.3:  Performance  of  a  two  visual  policies.  The  trivial  high-level  controller  uses 
7r(7[n])  =  Q  and  i  =  c( 7,  {—10,  —5,  0,  5, 10}),  where  function  c  is  as  defined  in  (3.2). 
To  illustrate,  it  picks  £0  when  —2.5°  <  7  <  2.5°.  Visual  policy  is  defined  in  (5.4). 
See  Figure  5.2  to  distinguish  five  fixed  controllers  shown  in  this  picture.  The  expected 
numbers  of  steps  before  falling  are  calculated  using  (2.12)  versus  /v7.  Slopes  ahead  of 
the  robot  are  assumed  to  be  normally  distributed  with  u7  =  5  deg. 
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5.1.3  Blind  (to  the  terrain)  Walking 


In  practice  the  robot  knows  its  state  and  it  is  typical  to  assume  policy  to  be  a  function 
of  the  state  in  Markov  Decision  Processes,  i.e., 

CM  =  Tr(x[n]).  (5.5) 

The  optimal  policy  is  commonly  solved  using  value  iteration  [Bellman,  1957],  which  works 
by  recursively  calculating 

V{i)  :=  max  j  ^  Ptj{ ()  ( R(j )  +  a  V{j))  J  ,  (5.6) 

where  V  is  the  value,  PtJ(C)  is  the  probability  of  transitioning  from  Xi  to  x3  when  (  is 
used,  R(j )  is  the  reward  for  transitioning  to  Xj,  and  a  G  [0, 1]  is  the  discount  factor, 
which  is  chosen  to  be  0.9.  This  equation  is  iterated  until  convergence  to  get  the  optimal 
policy. 

Remember  that  the  failure  state  is  X\.  The  value  of  the  failure  state  is  initially  set  to 
zero,  i.e., 

v(l)  =  o,  (5.7) 

and  it  always  stays  as  zero,  because  the  reward  for  taking  a  successful  step  is  one,  while 
falling  is  zero,  i.e., 

0,  j  =  l, 

m  =  {  (5.8) 

1,  otherwise. 

Note  that  the  reward  function  we  use  does  not  depend  on  the  controller,  slope  ahead, 
or  current  state.  Use  of  more  sophisticated  reward  functions  (e.g.,  considering  energy, 
speed,  step  width)  is  a  topic  of  Section  5.3  and  [Saglarn  and  Byl,  2014b].  However,  our 
initial  focus  is  only  on  stability  in  this  chapter.  Substituting  (5.7)  and  (5.8)  into  (5.6) 
yields 

f  „  1 

(5.9) 


V(i)  :=  max  <  Pv(0  i1  +  «  VU)) 
C  1  j/i 
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The  probability  of  ‘transitioning  from  ay  to  Xj  when  £  is  used’  is 


e«(C)  =  EpfW  TSM-  (5-i°) 

7er 

So  we  can  alternatively  write  the  value  iteration  algorithm  as 

v(i):=  max  1  £  Pr(7)  (1  +  aV(j))  1.  (5.11) 

l  7er  jjt i  J 

Optimization  results  for  /x7  =  0  and  cr7  =  5  deg  are  shown  in  Figure  5.4.  In  the  light  of 
this  figure,  we  conclude  that  even  blindly  switching  between  the  controllers  may  provide 
dramatic  improvements  on  the  overall  performance.  However,  it  is  also  important  to  note 
that  we  have  not  studied  discretization  errors  yet.  So,  Figure  5.4  may  be  overselling  the 
improvements  obtained  by  blind-to-the-terrain  switching. 


Figure  5.4:  Blind  walking.  The  high-level  controller  knows  the  state  but  it  is  blind 
to  the  environment.  Value  iteration  is  used  to  get  the  optimal  policy.  See  Figure  5.2 
to  distinguish  five  fixed  controllers  shown  in  this  picture.  The  expected  numbers  of 
steps  before  falling  are  calculated  using  (2.12)  versus  //7.  Slopes  ahead  of  the  robot 
are  assumed  to  be  normally  distributed  with  cr7  =  5  deg. 
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5.1.4  Sighted  Walking 

In  sighted  walking  the  high-level  controller  uses  both  the  state  information  and  one- 
step  lookahead  to  terrain,  i.e.,  the  policy  is  given  by  (5.1).  To  use  the  one-step  lookahead 
in  deriving  policy,  we  modify  the  value  iteration  in  (5.11)  as 

V(i)-.=  J2nT  jpr(7)  £  7.0  (1  +  a  V(i))l.  (5,12) 

7er  l  j/1  J 

So,  maximization  operation  is  carried  for  each  slope  in  T. 

Instead  of  modifying  the  value  iteration  algorithm,  we  could  define  a  new  11D  state, 
including  the  slope  in  addition  to  the  10D  state  of  the  robot.  However,  (5.12)  makes 
the  analysis  of  the  following  parts  easier,  reduces  computational  cost,  and  requires  less 
memory.  Note  that  Hr (7)  gives  the  probability  of  ‘the  slope  ahead  being  7’  and  Tfj( 7,  £) 
is  the  probability  of  ‘transitioning  from  xt  to  x3  when  (  is  used  and  the  slope  is  7’. 

We  optimize  for  /x7  =  0  and  <r7  =  5  degrees  to  get  the  performance  shown  in  Figure  5.5. 
Noting  the  logarithmic  y-axis,  it  is  clear  that  sighted  walking  is  significantly  better  than 
visual  and  blind  walking,  as  one  would  intuitively  expect.  The  ability  to  quantify  this 
intuition  is  a  driving  goal  of  our  work. 

Although  the  optimal  high-level  control  in  Figure  5.5  is  obtained  by  optimizing  for 
/x7  =  0  and  <r7  =  5  degrees,  the  figure  demonstrates  that  it  also  performs  well  for  /x7  ^  0. 
In  the  following  sections  we  observe  that  the  same  argument  follows  for  <r7  ^  5  degrees. 

The  dramatic  improvement  gained  by  using  one-step  terrain  lookahead  along  with 
state  information  motivates  measuring  how  much  a  two-,  three-  and  infinite-step  looka¬ 
head  increases  the  stability,  which  is  a  topic  of  future  work. 
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Figure  5.5:  Sighted  walking.  The  high-level  controller  uses  both  the  state  information 
and  one-step  lookahead  to  terrain.  See  Figure  5.2  to  distinguish  five  fixed  controllers 
shown  in  this  picture.  The  expected  numbers  of  steps  before  falling  are  calculated  using 
(2.12)  versus  /i7.  Slopes  ahead  of  the  robot  are  assumed  to  be  normally  distributed 
with  cr7  =  5  deg. 
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5.2  Robustness  of  Switching 

Although  Figure  5.5  is  impressive,  for  this  methodology  to  be  applicable  to  real-life 
problems,  policies  must  be  robust  to  uncertainties.  In  particular,  considering  the  noise 
in  the  inputs  of  high-level  controller  is  crucial.  So,  in  this  section  we  study  and  improve 
robustness  of  sighted  walking  to  terrain  estimation  and  state  information.  The  results 
motivate  experimenting  the  methods  and  controllers  of  this  thesis  on  a  real  robot. 

5.2.1  Noisy  Terrain  Estimation 

We  start  our  robustness  study  by  considering  the  addition  of  noise  to  slope  informa¬ 
tion.  The  slope  ahead  is  still  defined  by  variable  7,  but  let’s  say  the  controller  thinks  it 
is  (closest  to)  7  6  T,  due  to  the  noise  /  G  T.  The  relationship  is  given  by 

7  =  max{min(T),  min(max(T),  7  +  /)).  (5.13) 

Note  that  this  equation  simply  says  7  =  7  +  /  except  at  boundaries  of  the  slope  set.  Pl{1) 
is  defined  by 

Pl(1)  ■=  Pr(l[n ]  =  /),  (5.14) 

and  normally  distributed  with  zero  mean  and  standard  deviation  07,  i.e., 

l[n]  ~  Af(0,  of).  (5.15) 

In  the  presence  of  noise,  the  policy  is  a  function  of  7,  not  7.  Thus,  we  have 

C[n]  =  7r(x[n],7[n]).  (5.16) 

Then  the  stochastic  state-transition  matrix  in  (2.10)  becomes 

T  =  Y.  Pl(1)  Td( 7,^,7)).  (5.17) 

7er  ie r 
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To  account  for  noise  in  the  slope  information  while  optimizing,  we  first  rewrite  the 
modified  value  iteration  algorithm  as 

V(S)  :=X/ncax  |  Pv{l)P{h  7)  T£(j,C)  (l  +  a^(j))l,  (5.18) 

7er  l.  7er  j^i  ) 

where  P( 7,7)  is  the  probability  of  ‘actual  slope  being  7  when  robot  thinks  it  is  7’.  This 

probability  is  given  by 

P(7,7)  =  ^Pl(0,  (5.19) 

lei1 

where  T  =  {l  e  T  |  7  =  max(min(T) , min(max( T),7  +  /))}. 

Note  that  in  this  setting  the  sighted  policy  of  Figure  5.5  is  a  special  case  that  is 
obtained  assuming  no  sensor  noise,  i.e.,  cp  =  0°.  Figure  5.6  shows  what  happens  to  this 
policy  in  the  presence  of  noise.  As  cp  increases  we  rapidly  start  performing  worse  than  the 
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Figure  5.6:  Noisy  terrain  estimation.  Sighted  policy  from  Figure  5.5  performs  poorly 
with  cp  7^  0.  Considering  noise  while  optimizing  provides  robustness  while  keeping 
almost  all  the  optimality.  The  expected  numbers  of  steps  before  falling  are  calcu¬ 
lated  using  (2.12)  versus  cp.  Slopes  ahead  of  the  robot  are  assumed  to  be  normally 
distributed  with  /r7  =  0  and  cr7  =  5  deg.  Fixed  controller  £0  is  drawn  for  reference. 
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fixed  controller  £0-  The  figure  also  demonstrates  that  considering  noise  while  optimizing 
provides  robustness  to  terrain  estimation  while  mostly  keeping  the  optimality.  Although 
optimizing  for  very  high  oy  values  seem  tempting  in  this  graph,  it  is  important  to  note 
that  we  have  not  considered  discretization  errors  yet  and  as  we  assume  higher  oy  while 
optimizing,  the  high-level  controller  tends  to  rely  more  on  the  state  information. 

Next,  we  assume  the  sensor  noise  is  given  by  oy  =  1°.  In  practice  this  number  can  be 
easily  and  conservatively  calculated.  The  performance  of  the  high-level  control  optimized 
for  (j i  =  2°  versus  mean  slope  is  presented  in  Figure  5.7. 


Mean  Slope  of  the  Terrain  Ahead  (/x7)  (degrees) 

Figure  5.7:  A  policy  which  is  robust  to  sensor  noise.  Comparing  with  Figure  5.5,  we 
conclude  that  we  obtain  nearly  optimal  performance  even  under  the  presence  of  sensor 
noise  in  terrain  estimation.  See  Figure  5.2  to  distinguish  five  fixed  controllers  shown  in 
this  picture.  The  expected  numbers  of  steps  before  falling  are  calculated  using  (2.12) 
versus  /v7.  Slopes  ahead  of  the  robot  are  assumed  to  be  normally  distributed  with 
<j7  =  5  deg.  Slope  estimation  experiences  a  normal  noise  with  zero  mean  and  standard 
deviation  oy  =  1°. 
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5.2.2  Discretization  Errors 

Up  until  now,  we  optimized  policies  and  evaluated  resulting  performance  using  the 
same  (dthr  =  1)  mesh.  In  this  section,  we  keep  optimizing  on  the  coarse  (dthr  =  1)  mesh, 
but  we  estimate  performance  using  a  denser,  more  refined  ( dthr  —  0.7)  mesh,  intended 
to  better  approximate  the  true  system  dynamics.  The  high-resolution  mesh,  which  has 
199,358  states,  is  used  to  tune  the  real  performance  of  high-level  control  and  ensure  its 
robustness  to  discretization  errors.  First,  we  need  to  explain  how  the  coarse-mesh  policy 
can  be  applied  when  the  noisy  slope  ahead  estimation  7 [n]  may  not  be  in  the  slope  set, 
and/or  state  x [n]  may  not  be  in  the  mesh.  In  these  cases,  we  apply  the  most  basic  idea 
formulated  by 

CM  =  7r(c(x[n],  X),c(7[n],r)),  (5.20) 

where  function  c  is  as  defined  in  (3.2).  In  its  general  form,  let  us  denote  the  dense  mesh 
by  Xd,  which  is  obtained  from  a  slope  set  Using  this  mesh,  we  can  approximate  how 
(5.20)  would  behave  in  practice  by 

CM  =  n(c(xd[n\,X),c(%[n\,T)),  (5.21) 

where  xd  G  Xd  and  77  G  Td.  The  definition  and  calculation  of  T  remain  the  same, 
but  this  time  Xd  and  T^  is  used  in  obtaining  it.  Figure  5.8  shows  that  the  policy  from 
Figure  5.7  performs  poorly  when  evaluated  on  the  dense  mesh.  We  must  once  again  refine 
our  algorithm,  this  time  to  improve  robustness  to  meshing  discretization. 

In  our  approach  to  fix  this  issue,  we  consider  the  following:  While  the  actual  state 
is  Xi,  the  robot  may  think  it  is  27.  To  make  this  clear,  we  rewrite  the  value  iteration 
algorithm  as 

v '(*):=  Z>a*  {  £  Pr(l)Ph,i)  £  P(i,  h)  £  1^7,0  (l+aC(j))l,  (5.22) 

-yer  l7er  *  j^i  J 

where  P(i,  k)  is  the  probability  of  ‘actual  state  being  Xi  when  robot  thinks  it  is  Xk  ■ 
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Figure  5.8:  Performance  evaluation  on  the  dense  mesh.  The  policy  from  Figure  5.7 
performs  poorly.  However,  discretization  errors  can  be  handled  to  obtain  the  robust 
policy  shown  in  this  graph.  Compare  to  Figure  4.2,  where  the  stability  of  low-level 
controller  was  verified  with  Monte  Carlo  simulations.  The  expected  numbers  of  steps 
before  falling  are  calculated  using  (2.12)  versus  ct7.  Slopes  ahead  of  the  robot  are  as¬ 
sumed  to  be  normally  distributed  with  /r7  =  0.  Sensor  noise  is  given  by  07  =  1°.  Fixed 
controller  £0  is  drawn  for  reference.  Numerical  limit  at  1012  is  due  to  machine  preci¬ 
sion,  and  it  corresponds  to  a  10  thousand  tours  around  the  world  for  a  human-sized 
robot  with  half  a  meter  step  length. 


Finding  the  best  calculation  for  P(i,  k )  is  an  open  question.  However,  it  is  intuitive 
that  for  a  given  state  k  (the  robot  thinks  the  state  is  Xk),  P(i,  k)  should  be  smaller  for 
i  for  which  d(xi,Xk )  is  larger.  Among  many  other  methods,  inverse  distance  weighting 
as  in  [Shepard,  1968]  did  not  provide  desirable  performance.  In  [Saglam  and  Byl,  2014c], 
we  illustrate  that  an  exponential  distribution  works  well.  In  this  thesis,  to  show  the 
applicability  of  various  distributions,  we  use  power  distribution  given  by 


P(i,k)  = 


ECAC 


Ac(l-A), 


(5.23) 


where  xt  is  the  cth  closest  state  to  Xk  and  0  <  A  <  1  is  the  distribution  parameter1. 


1  Assume  0°=1  when  using  (5.23). 
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Note  that  this  is  just  a  power  distribution  scaled  to  have  P(h  k)  =  1.  As  we  increase 
A,  robustness  increases  but  performance  drops.  A  =  0  would  mean  P(i,i )  =  1  and 
P(i,  k)  =  0  for  i  ^  k,  i.e.,  what  we  had  before  this  section.  A  =  1  would  try  to  consider 
all  points  in  the  mesh2  with  equal  probability.  Then  the  high-level  controller  would  not 
make  use  of  state  information,  so  it  would  be  a  visual  policy. 

To  derive  final  high-level  control  policy,  we  again  use  07  =  2°  for  optimization  and 
quickly  tune  A  =  0.5  by  trial  and  error  using  the  dense  mesh.  The  resulting  performance 
is  demonstrated  in  Figure  5.8.  The  stability  of  the  high-level  controller  turns  out  to 
be  more  than  2  orders  of  magnitude  depending  on  the  terrain  roughness.  To  illustrate, 
for  /x7  =  0°  and  <r7  =  3.5°,  while  the  most  stable  low-level  controller  (£0)  is  expected 
to  take  1.4xl07  steps,  the  high-level  controller  increases  this  number  to  1.8xl09.  As 
in  Figure  4.2,  we  conduct  Monte  Carlo  simulations  for  higher  <r7  values  as  shown  in 
Figure  5.8  and  Table  5.1.  The  close  matches  verify  the  performance  of  the  final  high-level 
control  policy.  To  compare  it  with  all  the  low-level  controllers  available  to  the  robot,  we 
present  Figure  5.9,  where  we  assumed  /x7  =  0  and  <r7  =  5  deg. 

Table  5.1:  Monte  Carlo  simulations  for  final  high-level  control  policy.  Slopes  are  nor¬ 
mally  distributed  with  /r7  =  0  and  cr7  =  5  deg.  One  step  look  ahead  experiences  a 
normal  noise  with  zero  mean  and  standard  deviation  07  =  1°.  10  thousand  iterations 
were  conducted  until  failure  and  the  results  are  presented  on  the  last  row.  Estimation 
using  (2.12)  is  carried  on  the  dense  mesh. 


cr7  =  7° 

00 

0 

<t7  =  9° 

<t7  =  10° 

Estimation  using - — 

1  —  A2 

637.4 

167.9 

66.45 

33.99 

Monte  Carlo  simulations 

633.23 

171.41 

68.35 

35.24 

2Note  that  c  is  bounded  between  zero  and  number  of  states  minus  1. 
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Mean  Slope  of  the  Terrain  Ahead  (/i7)  (degrees) 

Figure  5.9:  Stability  of  the  final  high-level  policy.  For  this  specific  roughness  the  per¬ 
formance  is  an  order  of  magnitude  improvement  compared  to  the  most  stable  low-level 
controller.  See  Figure  5.2  to  distinguish  five  fixed  controllers  shown  in  this  picture. 
The  expected  numbers  of  steps  before  falling  are  calculated  using  (2.12)  versus  /i7. 
Slopes  ahead  of  the  robot  are  assumed  to  be  normally  distributed  with  <r7  =  5  deg. 
Slope  estimation  experiences  a  normal  noise  with  zero  mean  and  standard  deviation 

<n  =  i°. 
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5.3  Incorporating  Secondary  Metrics 

In  section  4.4  we  showed  how  to  incorporate  metrics  like  speed  and  energy  efficiency 
into  low-level  controller  optimization.  Here  we  repeat  the  same  process  for  high-level 
control  as  in  [Saglam  and  Byl,  2014b],  As  case  demonstration  in  this  chapter,  we  employ 
the  two  controllers  in  Table  4.3,  one  of  which  is  more  energy  efficient  but  less  stable. 
Intuitively  the  high-level  controller  should  prefer  the  efficient  controller  when  the  slope 
ahead  is  mild  and  state  is  appropriate.  For  stability,  it  better  pick  the  more  stable  low- 
level  controller  otherwise. 

For  simplicity  and  proof  of  concept,  in  this  section  we  neglect  the  sensor  noises  and 
discretization  errors.  To  incorporate  energy  efficiency  into  high-level  control  design,  unlike 
(5.6)  and  (5.12),  we  use 

V(i):=  £max  |Pr(7)  £  Iy(7,C)  (flyO.C)  +  “  V<J))  j  •  (5-24) 

7er  c  l  j  J 

The  reward  is  given  by 

0,  j  —  1, 

^•(7,0=  <  (5.25) 

I  1  —  /3  COT(xj,  7,  C),  otherwise, 

where  COT(a7,7,C)  is  the  cost  of  transport  when  the  robot  takes  a  step  from  state  Xi 
using  controller  (  and  the  slope  ahead  is  7.  Figure  5.10  shows  the  performance  of  high- 
level  controller  obtained  using  —  1.  It  is  as  stable  as  the  robust  controller  and  almost 
as  energy  efficient  as  the  other  controller.  To  derive  the  expected  cost  of  transport,  we 
computed  the  expected  energy  expenditure  and  expected  distance  using  the  methodology 
explained  in  Appendix  A. 3. 

Comparing  Figure  5.10  and  Table  4.3,  we  note  that  the  relatively  energy  efficient 
low-level  controller  turned  out  to  be  a  little  off  from  what  is  calculated  before,  but  the 
results  are  fairly  close. 
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Expected  Cost  of  Transport 

Figure  5.10:  Incorporating  secondary  metrics  on  high-level  control  design.  The 
low-level  controllers  are  as  listed  in  Table  4.3.  The  high-level  controller  is  stable  as 
the  robust  low-level  controller  and  almost  as  energy  efficient  as  the  other  low-level 
controller.  To  derive  the  expected  cost  of  transport,  we  first  needed  to  calculate  ex¬ 
pected  energy  expenditure  and  expected  distance  using  the  methodology  explained  in 
Section  A. 3.  The  expected  numbers  of  steps  before  falling  are  calculated  using  (2.12). 
Slopes  ahead  of  the  robot  are  assumed  to  be  normally  distributed  with  /r7  =  0  and 
cr7  =  5  deg. 


Overall  this  chapter  motivates  the  use  of  high-level  controller  for  increased  autonomy 
and  better  performance,  including  stability,  speed,  and  energy  efficiency. 
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Conclusions  and  Future  Work 


Toward  capable  robots  operating  reliably  in  real-world  environments,  this  thesis  quanti¬ 
fied  robustness  to  variations  with  an  intuitive  and  meaningful  metric,  which  is  later  used 
to  optimize  and  benchmark  a  given  control  scheme.  Specifically,  we  introduced  extended 
hybrid  zero  dynamics  framework  which  produced  quantitatively  good  performance  for 
two-legged  walking  on  rough  terrain. 

The  proposed  mathematical  tool  in  this  thesis  is  based  on  discretization  of  dynamics 
to  treat  the  system  as  a  Markov  chain  representation  which,  in  addition  to  reliability,  was 
used  to  calculate  expected  speed  and  energy  consumption  under  variations  such  as  rough 
terrain.  As  importantly,  the  process  of  learning  the  discretized  system  provided  extremely 
valuable  information  about  the  dynamics  which  was  later  used  to  obtain  high-level  be¬ 
havioral  algorithms  for  autonomous  walking.  Overall,  this  thesis  proposes  a  systematic 
way  of  obtaining  controllers  that  are  autonomous  and  reliable  in  a  quantifiable  manner. 

On  top  of  obtaining  a  Markov  chain  representation  in  Chapter  2,  Appendix  A  ex¬ 
plained  our  approach  with  toy  examples  to  motivate  researchers  to  employ  our  method¬ 
ology.  These  newly  designed  tools  arc  highly  applicable  to  a  wide  range  of  robotic  systems 
such  as  3D  bipeds,  hopping,  running,  and  experimental  robots.  Moreover,  many  interest- 


61 


Conclusions  and  Future  Work 


Chapter  6 


ing  applications  can  be  identified  and  addressed  in  various  other  dynamical  systems  that 
are  highly  amenable  to  quantification  of  performance  under  stochasticity,  which  includes 
physical  and  social  graphs,  epidemics,  queues,  and  teleoperation. 

One  of  two  major  future  directions  is  to  exploit  this  mentioned  applicability.  Sec¬ 
ond,  we  would  like  to  keep  developing  the  tools  of  this  thesis  to  advance  the  science  of 
highly-dynamic  robots.  In  particular,  for  two-legged  walking  we  are  interested  in  more 
realistic  and  higher  dimensional  biped  and  terrain  models.  The  former  can  potentially  be 
a  3D  walker  with  complex  foot,  while  the  latter  can  consider  varying  friction  coefficient, 
obstacles,  and  holes  to  be  avoided.  The  framework  is  already  capable  of  measuring  the 
advantages  of  additional  control  actions  such  as  reflexes.  We  also  would  like  to  study  the 
benefits  of  more  than  one-step  lookahead  to  the  terrain.  Very  interestingly,  our  powerful 
optimization  technique  allows  directly  mapping  terrain  estimation  and  state  information 
to  torques,  which  is  to  be  exploited  in  a  future  work.  Incorporating  more  advanced  ma¬ 
chine  learning  techniques  has  potential  to  provide  even  better  autonomy  and  robustness 
to  the  system. 
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Many  interesting  applications  can  be  identified  and  addressed  in  various  dynamical  sys¬ 
tems  that  are  highly  amenable  to  quantification  of  performance  under  stochasticity,  in¬ 
cluding  robots,  physical  and  social  graphs,  epidemics,  and  teleoperation.  Often  for  such 
systems,  eigenalysis  yields  a  meaningful  and  intuitive  measure  of  overall  stability,  namely 
system-wide  mean  first  passage  time,  which  we  mainly  refer  to  as  expected  number  of 
steps  before  failure  in  this  thesis. 

First  passage  time,  aka  first  hitting  time,  gives  survival  duration  until  a  specific  event 
or  set  of  events,  such  as  death  or  failure.  In  discrete-time  models,  the  expected  number 
of  discrete  time  steps  of  survival  corresponds  to  mean  first  passage  time  (MFPT).  While 
different  initial  conditions  (states)  often  result  in  different  MFPTs,  for  many  metastable 1 
systems  a  scalar  called  system-wide  MFPT  is  an  accurate  estimate  across  a  large  set  of 
states,  and  it  can  potentially  well  represent  the  stability. 

More  recently,  the  tools  for  quantifying  metastable  systems  has  been  applied  to  walk¬ 
ing  robots  to  predict  how  a  robot  performs  over  variable  terrain  for  a  given  control  policy 
[Byl  and  Tedrake,  2009],  [Saglam  and  Byl,  2014c].  For  such  analyses,  the  walking  robot, 
1See  Chapter  2  for  metastability. 
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the  environment,  the  system  noise,  and  the  control  actions  can  be  modeled  together  as  a 
Markov  chain  as  explained  in  Chapter  2.  Assuming  that  the  initial  state  of  the  robot  lies 
within  “metastable  region1'  of  state  space,  the  eigenvalues  of  the  state  transition  matrix 
of  the  Markov  chain,  specifically  the  largest  eigenvalue  not  associated  with  the  (absorb¬ 
ing)  failed  system  state,  can  be  used  to  predict  the  number  of  steps  the  robot  can  take 
before  failing. 

Using  a  Europe  tour  game,  this  chapter  explains  the  robotic  locomotion  community 
the  dynamics  of  metastable  walking.  It  also  aims  to  encourage  countless  controls  appli¬ 
cations  of  MFPT  calculation  for  metastablc  systems.  Finally,  a  more  generalized  concept 
of  first  passage  time,  namely  first  passage  value,  is  presented  to  discuss  both  the  mean 
and  variability  of  a  value  of  interest  for  a  metastable  system. 


A.l  Toy  Example:  Europe  Tour 


Consider  a  person  traveling  between  some  of  the  largest  cities  in  Europe  shown  in 
Figure  A.l.  After  spending  a  day  that  person  either  stays  in  the  same  city,  or  moves 
to  one  of  the  connected  cities.  The  probability  of  action  is  directly  proportional  to  the 
population  of  the  next  city.  For  instance,  when  in  Madrid, 


Probability  of  staying  in  Madrid  = 


Population  of  Maclird 


Population  of  Madird  +  Population  of  Paris 


(A.l) 


For  the  Europe  tour  game,  Istanbul  represents  failure2  for  a  robot.  The  number  of  days 
before  reaching  Istanbul  is  analogous  to  the  number  of  steps  before  failure  for  a  walker. 
Each  of  the  remaining  cities  corresponds  to  a  (discretized)  state  of  the  robot.  For  an 
explanation  of  discretization,  please  see  Chapter  2. 


2Istanbul  may  also  represent  success  depending  on  the  application.  See  epidemics  example  of  this 
appendix.  In  general  we  are  interested  in  expected  number  of  steps  before  escape  (failure  or  success). 
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Figure  A.l:  Populations  of  Europe’s  most  populated  8  cities  excluding  Russia.  Cities 
are  ordered  by  their  population  as  indicated  in  parenthesis.  For  instance,  Athens  (State 
3)  is  the  third  most  crowded  city  in  this  map.  Consider  a  person  traveling  between 
these  cities.  After  spending  a  day  that  person  either  stays  in  the  same  city,  or  moves 
to  one  of  the  connected  cities.  The  probability  of  action  is  directly  proportional  to  the 
population  of  the  next  city.  The  resulting  Markov  Chain  has  similarities  with  walking 
dynamics. 

The  author  would  like  to  thank  Beril  Pisgin  for  drawing  this  picture. 


Note  that  as  time  goes  to  infinity,  the  probability  of  visiting  Istanbul  becomes  1. 
However,  if  the  population  of  Paris  was  infinite,  then  being  in  Paris  would  be  stable  and 
the  tourist  would  spend  infinitely  many  days  without  going  to  Istanbul.  This  behavior 
is  analogous  to  periodic  walking  on  flat  terrain.  For  an  appropriate  initial  condition,  the 
robot  would  never  leave  the  periodic  orbit  under  deterministic  conditions,  which  implies 
walking  infinitely  many  steps  without  failure.  By  taking  a  Poincare  section  intersecting 
walking  motion,  this  periodic  orbit  can  be  represented  as  a  fixed  point  on  the  corre¬ 
sponding  Poincare  map.  So,  Paris  is  analogous  to  the  fixed  point  on  the  Poincare  map 
for  the  robot  walking  in  terms  of  mapping  back  to  itself  at  each  step.  Moreover,  Madrid, 
London,  Rome,  and  Berlin  are  in  the  basin  of  attraction  for  stable  walking.  Starting  from 
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any  one  of  those  cities  means  going  to  Paris.  However,  Athens  is  destined  to  be  absorbed 
rapidly,  because  it  is  only  connected  to  Istanbul.  Similar  to  a  tripped  walker,  Kiev  is  a 
risky  state,  because  it  may  lead  to  Istanbul,  which  means  failure.  Otherwise  going  Berlin 
results  in  stable  walking  at  Paris. 

For  dynamic  walking  on  rough  terrain,  metastability  is  present  if  the  number  of  steps 
before  failure  is  very  high  but  finite.  In  the  tour  example,  we  assume  the  population  of 
Paris  is  1  billion  to  study  metastability.  Then,  Istanbul  is  globally  asymptotically  stable, 
but  a  tourist  in  Paris  is  expected  to  spend  a  large  number  of  days  before  going  to  Istanbul. 
Define  reachable  (controllable)  subspace  as  the  union  of  states  that  might  transition  from 
(to)  Paris.  Then,  Paris,  Madrid,  London,  Rome,  and  Berlin  are  reachable  and  controllable 
states,  whereas  Athens  is  not  a  controllable  or  a  reachable  state.  In  addition,  imagine 
having  Barcelona  in  this  map  too,  from  which  the  tourist  can  go  to  Madrid,  but  not 
vice  versa.  Then  Barcelona  would  be  a  controllable  but  not  reachable  state.  As  following 
sections  justify,  unreachable  states  do  not  affect  the  system-wide  MFPT  value.  Thus, 
only  reachable  state  space  needs  to  be  meshed  for  a  walking  robot  and  the  curse  of 
dimensionality  can  be  avoided. 

A. 2  Absorbing  Markov  Chains 

To  calculate  the  expected  number  of  steps  before  a  specific  ‘escape’  event,  the  corre¬ 
sponding  set  needs  to  be  modeled  as  an  absorbing  ‘halt’  state.  For  the  Europe  tour  game, 
this  modeling  is  achieved  by  assuming  it  is  impossible  to  leave  Istanbul.  Remember  that 
Istanbul  is  analogous  to  the  set  of  failure  modes  for  a  walking  robot,  no  matter  how  the 
robot  failed.  Without  loss  of  generality,  let  the  halt  state  be  State  1  ( X\ )  and  note  that 
absorption  assumption  does  not  change  the  dynamics  until  State  1. 

For  the  Markov  Chain  under  consideration,  the  state  distribution  vector  at  step  n  is 
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denoted  by  p  [n]  and  defined  by 

Pi[n]  :=  Pr(X[n]  =  Xi).  (A. 2) 

So  Pi[n]  corresponds  to  the  probability  of  being  at  state  Xi  at  step  n.  Since  probability 
cannot  be  negative,  p  [n]  is  a  non-negative  vector,  and  because  the  system  has  to  be  at 
a  state  at  any  step,  p[n]  sums  to  1.  The  state  transition  matrix,  aka  the  Markov  matrix 
or  the  stochastic  matrix,  is  defined  by 

Tij  :=  Pr(X[n  +  1]  =  Xj  |  X[n]  =  x^).  (A. 3) 

So,  the  entry  of  T  on  the  ith  row  and  jth  column  gives  the  probability  of  transitioning 
from  state  xlt  to  state  xr  To  illustrate,  the  Markov  chain  for  the  Europe  tour  game  is 


represented  with 

j 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0.0082 

0 

0.0035 

0 

0 

0 

0.9883 

0.7905 

0 

0.2095 

0 

0 

0 

0 

0 

0.0138 

0.0081 

0 

0.0034 

0 

0.0028 

0 

0.9720 

T  = 

0 

0 

0 

0 

0.0032 

0 

0 

0.9968 

.  (A. 4) 

0.6899 

0 

0 

0.1714 

0 

0.1387 

0 

0 

0.0139 

0 

0 

0 

0 

0 

0.0027 

0.9833 

0 

0.0082 

0 

0.0035 

0.0032 

0 

0.0027 

0.9825 

Indices  on  Figure  A.l  yield  that  T, 64  =  0.1714  is  the  probability  of  moving  from  Kiev  to 
Berlin.  Just  like  p  [n] ,  T  is  non- negative.  And  because  any  state  transitions  (possibly  to 
the  halt  state  or  the  starting  state  itself)  after  each  step,  each  row  sums  to  one.  Let  l  >  1 
be  the  number  of  states.  So,  the  state  transition  matrix  is  l  by  t.  The  state  transition 
matrix  gives  the  next  state  distribution,  given  the  current  one  by 

p[n  +  l]=T'p[n]  =  (T')n+1P[0], 
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where  the  prime  (')  symbol  denotes  the  transpose  operation. 

Let  A  be  an  eigenvalue  of  T.  Then,  there  exists  a  non-zero  vector  v  such  that 

Tv  =  Av.  (A. 6) 

As  in  [Matthews,  1995],  let  k  be  such  that  | Vj |  <  \vk\  for  all  1  <  j  <  t.  Equating  the  k- th 
components  in  equation  (A. 6)  gives 

E  T*3V3  =  XVk'  (A-7) 

j 

As  a  result, 

I At'fc |  =  | A |  \vk\  =  y  ]  TkjVj  <  y  ] Tkj\vj\  <  y  ^Tkj |nfc|  =  \vk\,  (A. 8) 

j  j  j 

where  Tkj  >  0  and  ]>M  Tkj  =  1  are  used.  |A||ufc|  <  |ufc |  implies  |A|  <  1. 

For  the  rest  of  this  chapter,  the  transpose  of  T  is  used  to  make  the  following  sections 
easier  to  follow.  Since  T  is  square,  T'  has  the  same  eigenvalues  as  T.  Due  to  the  nature 
of  transpose  operation  and  the  structure  of  T,  each  column  of  T'  sums  to  one. 

Remember  that  X\  is  an  absorbing  state,  which  represents  the  end  of  game.  Then,  T' 
can  be  partitioned  as 

Exi  Ti 

(A.9) 

0  T 

Note  that  A  =  1  and  v  =  [1  0  ...  0]'  satisfies  the  equation 

T'v  =  Av.  (A. 10) 

To  distinguish  (possibly  non-distinct)  eigenvalues,  note  them  by  A  j,  where  1  <  j  <  £. 
Without  loss  of  generality,  let  Ai  =  1  and  the  associated  basis  vector  be  vx  =  [1  0  ...  0]'. 

Existence  of  Jordan  normal  form  for  any  square  matrix  is  fundamental  to  Linear 
Algebra.  Consider  a  Jordan  normal  form  of  T  given  by 

f  =  vjv-1, 
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Then,  as  will  be  verified,  a  Jordan  normal  form  of  T'  is  given  by 

T  =  VJV-1, 


(A. 12) 


where  J 


1  0 
0  J 


(A. 13) 


and  V 


1  -[1  ...  1}V 

0  V 


(A. 14) 


Note  that  the  sum  of  each  column  of  V  equals  zero,  except  the  first  one.  Furthermore, 
these  columns  form  a  basis  in  R  .  Equation  (A. 12)  can  be  verified  as  follows.  The  inverse 
of  V  is 


V-1 


1  [1  ...  1] 

0  V -1 


(A. 15) 


Then,  the  right  hand  side  of  (A.  12)  can  be  calculated  as 


1  [1 .. 

,.  1]  -  [1  ...  ljVJV-1 

1  [1 ., 

••  l](I-f) 

0 

VJV  1 

0 

f 

(A. 16) 


Equation  (A. 12)  is  thus  verified  because  Ti  +  [1  ...  1]T  =  [1  ...  1]  (columns  of  T'  sum  to 
one). 

The  spectrum  of  T,  denoted  by  cr(T),  is  the  set  of  distinct  eigenvalues  of  T.  The 
spectral  radius  of  T  is  given  by 


p(T )  =  max  |  A  | .  (A. 17) 

Ae<r  (T) 

Let  the  spectral  radius  be  r  =  p{T).  As  proven  in  [Meyer,  2000],  because  T  is  a  non¬ 
negative  square  matrix, 

1.  r  G  a(T)  (r  is  an  eigenvalue  of  T), 
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2.  Tz  =  rz  for  some  z  G  AT  =  {u|  v  >  0  with  v  ^  0}. 

Let  A2  :=  r  and  v2  refer  to  the  associated  column  in  V.  Then,  v2  =  [—  ||z|| x  z']A  Now, 
consider  the  state  distribution 

1  ,  A 
=  vi  +  -t— v2.  (A.18) 

lzlli 

<fi  is  called  the  metastable  distribution.  Note  that  it  is  a  valid  initial  state  distribution 
since  it  sums  to  one  and  each  element  is  non-negative.  For  the  metastable  Europe  tour, 

0 

0.0081 

0 

K  0.0035 

A2  «  1  —  9.2393  x  10"5  and  (j)  «  (A.  19) 

0.0031 

1.1  x  10~5 

0.0027 

0.9826 

A2  being  close  to  one  means  escapes  are  rare  and  the  system  is  metastable.  The  first 
element  of  <fi  is  the  probability  of  being  at  aq,  and  it  equals  zero  by  definition.  In  this 
metastable  distribution,  the  probability  of  being  in  Paris  is  high  due  to  its  high  pop¬ 
ulation.  However,  Athens  does  not  appear  in  the  metastable  distribution,  because  it  is 
connected  only  to  the  absorbing  state.  The  probability  of  being  in  Kiev  is  close  to  zero, 
because  several  steps  are  typically  enough  to  move  directly  from  Kiev  to  either  Istanbul 
(the  absorbing  state)  or  Berlin.  The  latter  almost  implies  going  to  Paris  in  the  following 
step  due  to  high  population  there. 

Taking  a  step  when  <p  is  the  initial  condition  results  in 

T'(f>  =  T'  (w-i  +  - — :pv2^  =  V!  +  - — — v2.  (A. 20) 

V  IMIi  J  INIi 
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Naturally,  the  resulting  distribution  is  also  non-negative  and  sums  to  one.  In  addition, 
the  first  component  is 

i  +  irr(-  llzlli)  = !  -  a2,  (a. 21) 

which  means  the  system  escaped  with  probability  1  —  A2.  Furthermore,  given  the  system 
did  not  escape,  (p  is  the  final  probability  distribution  for  the  states.  This  fact  can  be  seen 
by  zeroing  the  first  component  of  T'<p  and  scaling  to  sum  to  one.  So,  when  (p  is  the  initial 
state  distribution,  the  probability  of  escaping  is  1  —  A2,  the  probability  of  staying  in  the 
same  distribution  (<p)  is  A2. 

Now  let  the  initial  condition  be  <p,  that  is  p  [0]  =  <p.  The  expected  number  of  steps 
before  escape  corresponds  to  the  term  mean  first  passage  time  (MFPT)  in  [Byl  and 
Tedrake,  2009].  The  higher  MFPT  is,  the  more  stable  a  system  is  said  to  be.  Two  cases 
are  possible  depending  on  the  probability  of  taking  a  step:  If  A2  =  1,  then  the  probability 
of  escape  is  zero.  In  this  case  the  system  takes  infinitely  many  steps  without  escaping  to 
the  halt  state.  The  other  case  (A2  <  1)  is  relatively  more  complicated  and  interesting  as 
focused  next. 

A. 2.1  System-wide  First  Passage  Time 

Given  the  probability  of  taking  a  step  without  escaping  is  A2  <  1,  the  probability  of 
taking  n  steps  only,  equivalently  escaping  at  the  nth  step  is 

Pr(X[n]  =  xv,  X[n  -  1]  ^  Xl)  =  A£_1(l  -  A2).  (A.22) 

Realize  that  as  n  — »  00,  the  right  hand  side  goes  to  zero,  which  means  the  escape  is 
inevitable.  The  step  ended  up  escaping  also  counts  as  a  step,  which  can  be  verified 
considering  escaping  at  the  first  step  (taking  1  step  only).  When  n  =  1  is  substituted, 
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1  —  A2  is  obtained  as  expected.  Then,  the  expected  number  of  steps  can  be  calculated  as 


MFPT  =  E[FPT] 


X>Pr(xi  n]  =  xlt  X[ra  -  1]  ^  xi) 


(A. 23) 


^nAr'(l-A2 


1-A,’ 


where  FPT  stands  for  first  passage  time  and  the  fact  that  A2  <  1  is  used.  As  a  result, 


MFPT  can  then  be  calculated  using 


oo  A2  =  1, 


1  -  A2 


a2  <  i. 


The  MFPT  for  the  Europe  tour  is 


(A. 24) 


M  «  10,823. 


(A. 25) 


So,  if  p[0]  =  <p,  then  the  system  takes  approximately  10,823  steps  on  average. 
The  standard  deviation  of  FPT  can  be  calculated  by 

OO 

E[FPT2]  =  n2  Pr(X[n]  =  xu  X[n  -  1]  ^  x,) 


1  +  A2 

(1-A2)s 


E[FPT2]  -  (E[FPT])2  =  My/%. 


(A.26) 


Then  for  the  Europe  tour,  the  standard  deviation  of  FPT  is  M a/A^  ~  10,  822.  Note  that 
A2  being  close  to  one  results  in  a  standard  deviation  close  to  the  mean. 

The  MFPT  vector,  m,  gives  the  MFPT  for  each  state  and  it  is  defined  as 


nil  ■= 


i  =  1, 


1  +  Tijirij  otherwise. 


(A. 27) 
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Equation  (A. 27)  says  it  takes  zero  steps  to  go  to  the  halt  state  if  the  system  escaped 
already.  Otherwise,  the  number  of  steps  until  halt  is  1  less  after  a  step  is  taken.  For  the 
Europe  tour,  the  solution  to  (A. 27)  is 

0 

10,825 

1 

10,652 

rn «  .  (A. 28) 

10,825 

2,121 

10,674 

10,824 

This  vector  says  if  the  initial  state  is  X2,  then  10,825  steps  are  expected  before  escape. 
Note  that  London,  Paris,  Madrid,  Berlin  and  Rome  have  MFPT  close  to  the  system-wide 
MFPT  M.  Also  the  MFPT  of  Istanbul  and  Athens  are  very  small.  However,  Kiev  has  a 
MFPT  that  is  not  close  to  zero  or  the  system-wide  MFPT,  because  rn i  =  0,  m4  ~  M, 
and 

m6  =  T61mi  +  T64m4  +  T66m6  (A.29) 


implies 

m,  ~  =  2, 153.  (A.30) 

For  walking  robots,  the  picture  is  the  same.  Metastable  states  are  almost  equally  stable 
(m2  ~  m4  ~  m5  ~  rnj  ~  m8  ~  M)  and  some  states  are  doomed  to  fail  (m3  =  1).  Also 
some  states  are  risky  and  likely  to  fail  in  the  next  step,  otherwise  they  are  mapped  to 
the  metastablc  region,  just  like  Kiev.  This  structure  is  the  case  when  memory  constant, 
which  is  to  be  defined  shortly,  is  small. 
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By  using  (A. 9),  it  is  straightforward  to  obtain 


m  = 


(I 


0 


(A.31) 


(/  —  T')  is  invertible  when  A2  <  1,  which  is  the  hidden  assumption  made  while  defining 
the  MFPT  vector.  The  system-wide  MFPT  calculated  in  (A. 23)  can  be  also  obtained  by 


M  =  m'0  =  Vh  ...  1](I  -  i)"1  z.  (A.32) 

This  equivalence  makes  sense  because  each  state  has  its  own  MFPT,  and  MFPT  of  the 
metastable  distribution  is  just  a  convex  combination  of  each  state’s  MFPT  weighted 
according  to  <fi .  Indeed 


m  =  A-i1-  iKi-fr'z 
llzlli 

=  A-|i-  lKi-rrHi-T  +  fjz 
lzlll 

=  A-[i-  i]<i + a  - 

(A. 33) 

=  ir4-[i-  !]-  +  Tivrl1  -iKi-ir'Tz 


=  l  +  A2Vr[l  ...  lKI-fj-'z 

llzlll 

=  1  +  a 2M  M  —  1/(1  -  A2)  =  M. 

Note  that  M  is  upper  bounded  by  the  largest  element  in  m.  In  fact,  any  initial  state 
distribution,  p[0],  has  a  MFPT  that  is  a  convex  combination  of  the  m,  values. 

MFPT  of  the  metastable  distribution  is  useful  for  various  reasons.  First  of  all,  it  is 
a  lower  bound  for  average  steps  taken  from  at  least  one  of  the  states,  because  it  is  a 
convex  combination  of  mjS.  So,  there  exists  state(s)  at  least  as  stable.  Secondly,  it  is  a 
meaningful  and  intuitive  measure  of  overall  stability.  Often  systems  quickly  converge  to 
their  metastable  distributions,  where  MFPT  becomes  the  true  value.  Thirdly,  system- 
wide  MFPT  also  has  advantages  over  calculating  the  MFPT  vector.  In  case  T  is  large, 
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estimating  the  second  largest  eigenvalue  is  relatively  easy,  whereas  finding  the  inverse  to 
calculate  MFPT  vector  costs  more  time.  Also,  a  scalar  representing  the  stability  is  much 
easier  to  understand  than  a  possibly  huge  vector. 

Assume  T  has  distinct  eigenvalues.  Please  see  [Saglam  and  Byl,  2014a]  for  other  cases. 
Denote  the  initial  distribution  as 

P[0]  =  C1V1  +  C2V2  +  .-QVf 

Note  that  C\  =  1  to  have  1 1  p [0]  1 1  x  =  1.  Then, 

p[n]  =  (T')"p[0]  =  vi  +  C2A2V2  +  ...  +  ceX^vi. 

In  the  light  of  this  section,  the  metastable  distribution  is  also  given  by 

(pi  =  lim  Pr (X[n\  =  \ X[n\  ^xi\), 

n— >•  00 

when  the  limit  exists.  It  exists,  for  example,  when  the  eigenvalues  are  distinct 

A. 2. 2  How  Quickly  is  the  Initial  Distribution  Forgotten? 

The  initial  distribution  (condition)  is  forgotten  if  either  the  distribution  is  the  metastable 
distribution  (p  =  cp)  or  the  game  ended  (p  =  [1  0  ...  0]').  So,  the  question  can  be  para¬ 
phrased  as  “how  quickly  does  the  system  converge  toward  the  metastable  distribution, 
given  it  is  not  absorbed  yet?”.  For  systems  with  distinct  eigenvalues, 

memory  constant  = - — —  (A. 37) 

1  —  I  a3  1 

is  a  dimensionless  indicator  of  the  convergence  speed. 

The  definition  of  the  memory  constant  is  motivated  by  the  fact  that  Y^=o  = 

1/(1  —  A i)  for  | Ai|  <  1.  Although  this  property  also  holds  for  complex  A*  values,  since 

|  A3 1  =  |  A3  |7T,  memory  constant  uses  1/(1  —  |A3|)  to  quantify  how  many  steps  it  takes  before 

75 


(A. 34) 


(A. 35) 


(A. 36) 


First  Passage  Value 


Chapter  A 


vanishing.  Higher  A3  results  in  larger  1/(1  —  | A3 1 ) ,  which  means  the  initial  condition  is 
forgotten  slower  and  more  steps  are  required  to  forget  the  same  amount  of  initial  condition 
information.  A3  gives  the  worst  case  scenario,  so  the  memory  constant  is  a  conservative 
value.  (1/(1  —  | A3 1 )  is  divided  by  M  =  1/(1  —  A2)  for  a  relative  memory  constant,  which 
is  upper  bounded  by  1. 

Note  that  the  memory  constant  being  smaller  than  e  implies  M  >  1/e.  Thus,  small 
memory  constants  require  metastability.  If,  on  the  other  hand,  the  memory  constant  is 
very  close  to  one,  then  there  is  another  mode  almost  as  stable  as  the  one  associated  with 
A2.  In  such  cases  it  may  be  useful  to  use  the  next  |A*|  instead  of  |  A3 1 ,  until  a  small  memory 
constant  is  obtained.  In  particular,  when  the  eigenvalues  are  not  necessarily  distinct,  it 
might  be  the  case  that  A2  =  |A3|. 

For  the  Europe  tour  example,  the  memory  constant  is  1.1688  x  10-4.  This  small 
memory  constant  indicates  the  structure  mentioned  earlier,  where  there  are  metastable 
states,  controllable  unreachable  states,  doomed  states,  risky  states,  and  halt  state.  Some 
other  motivating  applications  of  MFPT  calculation  is  presented  later  in  this  Chapter. 

A. 3  Mean  First  Passage  Value 

Calculating  the  discrete  time  to  a  state  of  interest  has  potential  to  be  greatly  useful, 
but  in  some  applications,  metrics  other  than  expected  number  of  steps  before  absorption 
are  of  interest.  To  illustrate,  instead  of  how  many  decisions  it  took  before  reaching  Istan¬ 
bul,  travel-time  or  distance  to  Istanbul  might  be  needed.  For  this  matter,  MFPT  can  be 
generalized  as  mean  first  passage  value  (MFPV),  where  “value”  depends  on  the  task,  for 
example,  mean  first  passage  distance.  So,  MFPT  is  a  special  case  of  MFPV,  where  value 
is  discrete  time  steps. 
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Redefining  m  from  (A. 27)  as  the  MFPV  vector  gives  the  MFPV  for  each  state  as 


i  —  1 


m,  :  = 


T.ijRij  +  ^  Tijirij  otherwise, 


(A. 38) 


where  Rij  is  the  reward  (value)  of  transitioning  from  state  Xi  to  Xj.  Then,  vector  m  can 
be  calculated  as 

r 

0 


rn  = 


(i-r  )-■ 


3 


Y^TijRij 

l  j 


(A. 39) 


And  the  (system-wide)  MFPV  is 


M  =  m'(p. 


(A. 40) 


For  the  metastable  Europe  tour  example,  consider  the  information  provided  in  Ta¬ 
ble  A.l.  Then,  using  (A. 40)  mean  Erst  passage  distance  can  be  calculated  to  be  325.68 
thousand  km.  This  calculation  is  achieved  by  setting,  for  example,  i?24  =  1,  098  km, 
which  corresponds  to  the  distance  between  London,  and  Berlin. 
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London  (2)  -  Paris  (8) 
London  (2)  -  Berlin  (4) 
Madrid  (5)  -  Paris  (8) 
Rome  (7)  -  Paris  (8) 
Berlin  (4)  -  Paris  (8) 
Berlin  (4)  -  Kiev  (6) 
Berlin  (4)  -  Istanbul  (1) 
Kiev  (6)  -  Istanbul  (1) 
Rome  (7)  -  Istanbul  (1) 
Athens  (3)  -  Istanbul  (1) 


Time  (h:m)  Distance  (km) 


5:06 

454 

10:25 

1,098 

11:10 

1,270 

12:46 

1,419 

9:18 

1,055 

14:41 

1,329 

21:39 

2,210 

19:00 

1,459 

22:46 

2,262 

11:13 

1,095 

Table  A.l:  Travel  times  and  distances  for  the  roads  of  Europe  map  in  Figure  A.l. 
For  the  game  explained  in  that  figure’s  caption,  the  values  in  this  table  are  used  to 
calculate  the  expected  distance  and  time  before  visiting  Istanbul. 


The  travel-time  only  (excluding  the  days  spent  in  a  city)  is  calculated  by  setting 


Ra  =  0.  Then,  it  takes  128  days  on  average  before  Istanbul.  The  MFPV  vector  for  this 


case  is 


m  = 


0 

128.4756 

0.4674 

126.6098 

128.7334 

25.9478 

127.0149 

128.2681 


days. 


(A. 41) 
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Remember  that  a  random  decision  is  made  after  spending  a  day  in  the  city.  For  the 
MFPV  calculation,  to  include  those  days  spent  in  a  city,  the  reward  needs  to  be  set  such 
that  Ru  =  1  day.  This  setting  results  in  a  MFPV  of  29  years,  which  is  much  higher 
than  128  days.  One  reason  of  this  difference  is  because  once  in  Paris,  there  is  a  0.9825 
probability  to  stay  in  Paris. 

In  addition,  MFPV  does  not  need  to  have  a  physical  meaning.  Multiple  objectives 
can  be  included  in  a  single  value  function,  for  example  for  walking  robots  failure  events, 
such  as  falling  down,  can  be  penalized  while  also  rewarding  fast  speed  and  low  energy 
use.  The  value,  or  the  cost  function,  does  not  need  to  have  a  physical  correspondence. 
Number  of  steps  minus  10-3  times  energy  consumption  is  a  valid  value  definition.  Please 
see  [Saglarn  and  Byl,  2014b]  for  further  details  and  usage  of  this  example. 

A. 4  Confidence  Levels  on  Value 

In  some  applications,  instead  of  the  mean  FPV,  a  conservative  FPV  bound  for  a 
particular  “confidence  level”  pr  is  needed.  In  other  words,  a  value  above  lower  first  passage 
time  (LFPT)  would  be  observed  with  probability  pr,  and  a  value  below  upper  first  passage 
time  (UFPT)  would  be  observed  with  probability  pr. 

A. 4.1  First  Passage  Time 

The  probability  of  taking  more  than  LFPT  steps  is  A2FP1  •  Then,  the  lower  bound  on 
number  of  steps  taken  with  probability  pr  can  be  calculated  by 

LFPT(pr)  =  log\2pr.  (A. 42) 

Note  that  LFPT(1)=0,  so  taking  only  a  single  step  is  guaranteed,  which  leads  to  the  halt 
state.  The  probability  of  taking  less  than  LIFPT  steps  is  1  —  A2  .  Then,  the  upper 
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bound  on  number  of  steps  taken  with  probability  pr  is 

UFPT(pr)  =  log\2(l  —  pr)  +  1.  (A. 43) 

Since  lint  UFPT(pr)  =  oo,  it  may  theoretically  take  infinitely  many  steps  before  con- 

pr— >-1 

verging  to  the  halt  state.  Then  FPT  can  be  limited  for  a  given  probability  as 


LFPT  <  FPT  <  UFPT. 


(A. 44) 


Having  LFPT<UFPT  requires  pr  >  A2/(l  +  A2),  where  right  hand  side  equals  0.5  for 
A2  =  1.  To  illustrate  the  advantage  of  looking  to  FPT,  consider  the  metastable  Europe 
tour  example.  The  probability  of  the  journey  taking  more  than  M  =  10,  823  days  is  only 
36.79  %.  On  the  other  hand, 


1,140  <  FPT(0.9)  <  24,921, 
108  <  FPT(0.99)  <  49,842. 


(A. 45) 


These  numbers  are  not  coincidence.  As  A2  — >  1,  the  probability  of  taking  M  steps  is 


lim  A^(1-A2)  =  -  «  0.3679.  (A.46) 

A2— rl  ""  e 

In  fact,  0.3679  is  an  upper  limit  for  the  probability  of  taking  1/(1  —  A2)  steps  for  any  A2. 
In  addition,  when  A2  and  pr  are  close  to  one, 

LFPT(pr)  =  =  (1  -  pr)M,  (A.47) 

m(  A2)  1  —  A2 


UFPT(pr)  =  +  1  «  -ln(  1  -  pr)M,  (A.48) 

ln[  A2) 

where  M  denotes  MFPT. 

LFPT  is  of  interest  for  walking  systems,  to  give  a  conservative  bound  on  steps  to 

failure,  while  LIFPT  would  be  helpful  in  modeling  epidemics,  to  obtain  a  conservative 

time  when  everyone  is  healthy,  which  means  recurrence  to  an  “all-healthy”  system  state. 
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A. 4. 2  First  Passage  Value 


Using  MFPT  and  MFPV  value  per  step  can  be  used  to  write 


FPV 


MFPV 

MFPT 


FPT, 


(A. 49) 


where  FPV  denotes  first  passage  value.  When  the  value  is  the  distance  in  the  metastable 
Europe  tour  example, 


3.4  x  104  <  FPV(0.9)  <  7.5  x  105, 
3.2  x  103  <  FPV(0.99)  <  1.5  x  106. 


(A. 50) 


So,  with  probability  0.99,  the  journey  takes  more  than  3.2  thousand  or  less  than  1.5 
million  kilometers. 


A. 5  Controlling  the  Europe  Tour 


In  order  to  control  a  system,  a  goal  is  needed.  For  instance,  going  to  Istanbul  as 
quickly  as  possible  could  be  a  goal  for  the  Europe  tour.  It  is  then  useful  to  quantify  the 
performance  toward  achieving  that  goal.  For  example,  number  of  days  before  reaching 
Istanbul  would  be  a  meaningful  metric  to  minimize.  FPV  analysis  provides  useful  metrics 
for  many  applications.  To  illustrate,  the  more  steps  a  walking  robot  takes  before  failure, 
the  more  stable  it  is  said  to  be.  Once  the  performance  is  quantified,  the  control  can  be 
optimized. 

Control  action  comes  into  play  in  two  ways:  Low-level  and  high-level.  For  the  Europe 
tour,  say  the  probability  for  city  change  is  directly  proportional  to  population  to  the 
power  k,  e.g., 


Probability  of  staying  in  Madrid 


(Population  of  Madird)fc 

(Population  of  Madird)fc  +  (Population  of  Paris) k 


(A. 51) 
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where  k  G  {1,2,3}.  This  means  a  Markov  Decision  Process  with  three  control  actions. 
Previous  analysis  was  a  special  case  of  this  setting  with  k  —  1  (see  (A.l)).  Any  choice  of 
k  gives  a  Markov  Chain,  for  which  FPV  is  calculated  as  shown.  Choosing  afcora  set  of 
k  values  for  the  whole  system  is  the  low-level  control  problem  we  discuss  in  Chapter  4. 
Moreover,  if  multiple  k  values  are  available,  deciding  on  which  k  to  use  for  a  given  city 
is  the  high-level  control  problem,  which  we  tackle  in  Chapter  5. 

A.  6  Applications 

A. 6.1  Coin  Toss 

Consider  tossing  an  unfair  coin,  for  which  the  probability  of  having  tails  is  p.  As 
illustrated  in  Figure  2.5,  when  the  number  of  flips  before  two  heads  in  a  row  is  of  interest, 
three  states  are  possible:  (1)  Heads-heads,  (2)  Tails  in  the  last  flip  (including  ‘not-flipped 
yet’),  (3)  Tails- Heads  (including  ‘flipped  once  and  it  was  heads’).  When  p  =  0.01,  the 
system-wide  MFPT  is  M  10,099  and  the  memory  constant  is  1.0001  x  10-4.  The 
MFPT  vector  and  the  metastable  distributions  are  given  by 

0  0 

m «  10, 100  and  <j>  «  0.9901  (A.52) 

10,000  0.0099 

So,  if  the  initial  state  is  x2,  10, 100  flips  are  expected  before  two  heads  in  a  row. 

A. 6. 2  Epidemics 

For  the  susceptible-infected-susceptible  (SIS)  model  explained  in  [Ahn  and  Hassibi, 

2013],  the  number  of  discrete  time  steps  to  everyone  being  healthy  can  be  calculated.  As 

a  toy  example,  consider  a  network  of  only  2  people  with  no  birth  or  death.  Each  person 
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is  either  susceptible  or  infected,  so  four  states  are  possible  as  listed  in  Table  A. 2.  Note 
that  when  both  patients  are  susceptible  (State  1),  the  state  does  not  change,  because 
noone  can  become  infected. 


State  1 

State  2 

State  3 

State  4 

Patient  1 

Susceptible 

Susceptible 

Infected 

Infected 

Patient  2 

Susceptible 

Infected 

Susceptible 

Infected 

Table  A. 2:  Possible  states  for  SIS  model  of  epidemics  with  2  people 


Let  ‘the  probability  of  recovery  when  a  node  is  infected’  be  5  =  0.01,  and  ‘the  prob¬ 
ability  to  be  infected  when  the  other  node  is  infected’  be  0  =  0.8.  Then,  the  stochastic 
matrix  is  given  by 

10  0  0 

(1-0)S  (l-6)(l-0)  05  0(1 -S) 

(1  -P)5  05  (l-5)(l-/3)  0(1-8) 

55  5(1-5)  5(1-5)  (1  —  5)(1  —  5) 

(A. 53) 

10  0  0 

0.002  0.198  0.008  0.792 

0.002  0.008  0.198  0.792 

0.001  0.0099  0.0099  0.9801 

A2  is  such  that  MFPT  is  M  —  6.8383  x  103.  In  addition,  A3  =  0.19,  and  the  memory 
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constant  is  1.8  x  10  4.  The  MFPT  vector  and  the  metastable  distribution  are  given  by 

0  0 

6822.7  0.0122 

m  fa  and  <fi  ~  .  (A. 54) 

6822.7  0.0122 

6838.7  0.9757 

Small  memory  constant  results  in  states  having  a  MFPT  close  to  either  zero  or  M.  It  is 
interesting  to  see  that  with  a  high  probability  (0.9757),  both  people  are  infected  during 
the  epidemic  while  all  healthy  state  is  the  stable  one. 

Estimating  survival  times  of  epidemic  spreads  can  potentially  be  useful  for  public 
health  research  to  optimize  actions  while  minimizing  deaths  and  expenditures. 

A. 6. 3  Driving  a  Car 

Arguably,  given  enough  time  (millions  of  years  if  necessary),  any  driver  will  be  in¬ 
volved  in  an  accident.  The  same  is  true  for  autonomous  cars  including  the  Google  car. 
The  expected  miles  driven  between  accidents  can  be  estimated  to  quantify  and  increase 
the  safety. 

A. 6.4  Queueing  and  Polling  Systems  -  Traffic  Intersection 

Queueing  theory  deals  with  waiting  lines  and  queues.  One  application  of  polling  sys¬ 
tems,  which  are  extensions  of  queueing  systems,  is  traffic  intersections  with  multiple  lines 
as  illustrated  in  [Miculescu  and  Karaman,  2014],  As  a  toy  example,  consider  the  intersec¬ 
tion  with  two  lines  shown  in  Figure  A. 2.  The  goal  of  the  traffic  light  of  the  intersection  is 
to  avoid  traffic  jam,  which  is  defined  as  having  more  than  4  cars  in  a  lane.  Assume  that 
each  turn  a  single  car  may  pass  through  the  intersection  and  new  cars  come  stochastically 

to  each  line.  Let  the  number  of  cars  arriving  at  each  lane  have  Poisson  distribution  with 
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parameter  A  =  0.1.  This  intersection  can  be  easily  treated  as  a  Markov  Decision  Process. 
If  the  controller  always  allows  the  same  line  to  proceed,  the  expected  discrete-time  to 
traffic  jam  is  around  10.  The  exhaustive  policy  allows  a  line  to  proceed  until  that  line 
is  empty  and  then  switches  to  the  other  lane.  This  policy  results  with  1.09  x  106  turns 
before  traffic  jam. 


Figure  A. 2:  An  intersection  with  two  lanes.  Cars  are  either  going  from  west  to  east  or 
south  to  north.  Blue  cars  are  waiting  to  pass  through  the  intersection.  Cyan  colored 
car  recently  moved  to  the  north  side  of  the  intersection.  The  arrow  shows  which  lane 
is  allowed  to  proceed.  In  this  toy  example,  traffic  jam  corresponds  to  5  or  more  cars  in 
a  lane.  The  goal  of  the  traffic  lights  is  to  minimize  the  traffic  jam  rates.  Each  turn  a 
single  car  may  pass  through  the  intersection  and  new  cars  come  stochastically  to  each 
line.  The  number  of  cars  arriving  at  each  lane  has  Poisson  distribution  with  parameter 
A  =  0.1.  This  intersection  can  be  easily  treated  as  a  Markov  Decision  Process. 
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A.  7  Conclusion 

This  appendix  studied  first  passage  time,  which  is  a  survival  metric.  It  presented 
system-wide  mean  first  passage  time  (MFPT),  which  is  calculated  using  the  second 
largest  eigenvalue  of  the  stochastic  transition  matrix.  The  chapter  also  illustrated  that 
for  metastable  systems,  system-wide  MFPT  is  an  accurate  indicator  across  a  large  set 
of  states  of  those  frequently  visited.  Then  mean  first  passage  value  (MFPV)  was  intro¬ 
duced,  which  gives  a  more  general  value  of  interest,  such  as  energy  expenditure,  distance, 
or  time.  Bounds  on  first  passage  value  (FPV)  was  provided  for  a  given  confidence  level. 
The  chapter  also  showed  how  these  tools  explained  can  be  used  to  low-level  and  high-level 
control  hybrid  systems. 
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The  methods  of  this  paper  are  appropriate  and  useful  for  various  systems.  To  show 
applicability  to  high  degree  of  freedom  (DOF)  robots  (compared  to  2- link  walker  in  [Byl 
and  Tedrake,  2009]  and  3- link  walker  of  [Chen  and  Byl,  2012]),  the  analysis  in  this  thesis 
is  carried  out  with  the  5-link  biped  shown  in  Figure  B.l.  The  model  is  based  on  the 
RABBIT  robot  shown  in  Figure  1.4c  [Chevallereau  et  al.,  2003]. 

RABBIT  has  point  feet,  5  degrees-of- freedom  (DOF)  and  four  actuators  located  at 
the  internal  joints.  Since  the  robot  cannot  produce  ankle  torques,  it  is  underactuated 
by  1  DOF.  While  studying  RABBIT,  we  restrict  our  attention  to  planar  motion  and 
assume  links  are  rigid.  The  angles  shown  in  the  figure  form  q  :=  [q\  q2  q 4  gs]T,  the 

ten  dimensional  state  of  the  robot  is  defined  as  x  :=  [ qT  qT]T,  and  sh  denotes  the  height 
of  the  swing  foot.  The  model  parameters  are  taken  from  RABBIT  [Westervelt  et  al., 
2007]  and  listed  in  Table  B.l. 

Depending  on  the  number  of  legs  in  contact  with  the  ground,  the  robot  is  either  in 
the  single  or  double  support  phase.  Walking  consists  of  these  two  phases  in  sequence.  The 
single  support  (swing)  phase  has  continuous  dynamics.  Using  a  Lagrangian  approach,  the 
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Table  B.l:  Model  parameters  for  the  five-link  robot  based  on  RABBIT  [Westervelt 
et  al.,  2007] 


Description 

Parameter  Label 

Value 

Torso  Mass 

rriT 

12  kg 

Femur  Mass 

mf 

6.8  kg 

Tip  Mass 

mt 

3.2  kg 

Torso  Inertia 

It 

1.33  kg  m2 

Femur  Inertia 

If 

0.47  kg  m2 

Tip  Inertia 

It 

0.20  kg  m2 

Torso  Length 

It 

0.63  m 

Femur  Length 

h 

0.4  m 

Tip  Length 

h 

0.4  m 

Torso  Mass  Center 

Pt 

0.24  m 

Femur  Mass  Center 

Pf 

0.11  m 

Tip  Mass  Center 

Pt 

0.24  m 

Gravitational  Acceleration 

go 

9.81  m/s2 

Gear  Ratio 

ng 

50 

Ground  Friction  Coefficient 

ps 

0.6 

Saturation  Limit 

tt sat 

50  Nm 
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Figure  B.l:  Illustration  of  the  five- link  robot  with  identical  legs  that  is  used  as  our 
biped  model  in  this  thesis.  This  model  is  based  on  RABBIT  shown  in  Figure  1.4c. 

equations  of  motion  can  be  derived  in  the  canonical  form  of 

D(q)q  +  C(q,  q)q  +  G(q)  =  Bu,  (B.l) 


where  u  is  the  input.  An  important  point  is  that  q  consists  of  the  five  angles  depicted  in 
Fig.  B.l  whereas  u  has  only  four  elements.  The  system  has  this  degree  of  underactuation 
because  of  the  passive  joint  at  the  stance  tip.  The  swing  dynamics  can  be  equivalently 
expressed  as 


D~\-Cq-G  +  Bu ) 


f(x)+g(x)u. 


(B.2) 


During  the  swing  phase  of  a  successful  step,  the  swing  leg  takes  off  from  the  ground, 
passes  the  stance  leg  and  lands  on  a  further  point  on  the  ground.  So,  each  single  support 
phase  starts  and  ends  with  double  support  phases.  As  the  robot  has  point  feet,  the  double 
support  (stance)  phase  can  be  well  captured  as  an  instantaneous  impact  event.  The  robot 
experiences  this  impact  whenever  the  swing  foot  hits  the  ground  (sh  =  0)  from  above 
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(sh  <  0).  Let  us  denote  this  impact  surface,  aka  the  jump  set,  by  S.  Then  we  have 

A0  q ~ 

x+  =  A(x~)  :=  x  G  S,  (B.3) 

\(q~)  q ~ 

where  x~  =  [( q~ )T  ( q~)T]T  and  x+  are  the  states  just  before  and  after  the  impact  re¬ 
spectively.  Conservation  of  energy  and  the  principle  of  virtual  work  gives  the  mapping 
A  [Westervelt  et  ah,  2007],  [Hurmuzlu  and  Marghitu,  1994],  Essentially,  this  model  as¬ 
sumes  instantaneous,  inelastic  collisions  between  the  swing  leg  tip  and  the  ground,  with 
instantaneous  changes  in  velocities  to  reflect  the  effects  of  impulsive  forces  exerted  on 
the  robot. 

Although  the  robot’s  position  and  orientation  do  not  actually  change  according  to 
the  impact  model,  we  relabel  the  legs  every  step,  so  the  previous  swing  leg  becomes  the 
new  stance  leg  and  vice  versa.  So,  Aq  is  such  that  we  have  A q[qi  q2  q%  q±  q$}T  = 
[q2  q\  qA  q-s  qb]T ■  Without  relabeling,  a  periodic  walking  gait  would  have  two  steps  as 
its  period. 

Since  a  step  consists  of  a  single  support  phase  and  an  impact  event,  it  has  hybrid 
dynamics  as  illustrated  in  Figure  B.2.  In  our  modeling,  we  assume  the  impact  event 
occurs  first,  but  the  order  in  the  definition  of  a  step  is  an  arbitrary  choice,  so  long  as 
one  remains  consistent  after  deciding.  As  seen  in  Figure  B.2,  for  a  step  to  be  successful, 
certain  “validity  conditions”  must  be  satisfied,  which  we  list  next.  After  impact,  the 
former  stance  leg  must  lift  from  ground  with  no  further  interaction  with  the  ground  until 
the  next  impact.  Also,  the  swing  foot  must  have  progressed  past  the  stance  foot  before 
the  impact  of  the  next  step  occurs.  Only  the  feet  should  contact  the  ground.  Furthermore, 
the  force  on  the  stance  tip  during  the  swing  phase,  and  the  force  on  the  swing  tip  at  the 
impact  should  satisfy  the  friction  constraint  given  by 


friction 


/A  >  |  Ft 


transversal 
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If  validity  conditions  are  not  met,  the  step  is  labeled  as  “unsuccessful”  and  the  system 
is  modeled  as  transitioning  to  a  failure  state.  This  is  a  conservative  model,  because  in 
reality  violating  these  conditions  does  not  necessarily  mean  failure. 

Swing  tip  touched  the  ground 


Figure  B.2:  Hybrid  model  of  a  step  and  the  failure  state.  Swing  phase  is  governed  by 
continuous  dynamics  or  flows,  which  are  interrupted  by  discontinuous  impacts. 

Image  inspired  by  [Westervelt  et  al.,  2007]. 
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To  study  rough  terrain  walking,  the  methodology  in  Chapter  2  assumes  stochastically 
changing  terrain,  which  requires  modeling  ground  with  finitely  many  dimensions.  Two 
most  elementary  models,  which  are  only  one  dimensional  disturbances  to  the  robot,  are 
presented  in  Figure  C.l.  This  thesis  mainly  assumes  sloped  terrain  model,  but  results 
when  heights  are  varying  instead  are  also  presented  in  Section  4.3. 


1 


Figure  C.l:  One  dimensional  terrain  disturbances 
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A  natural  concern  is  related  to  the  validity  of  these  terrain  models.  Consider  the  higher 
dimensional  terrain  disturbance  in  Figure  C.2.  Red  dashed  line  shows  the  trajectory  of 
the  swing  foot  for  a  specific  initial  condition  and  the  controller  employed.  Given  this 
trajectory,  all  three  terrains  shown  on  the  right  hand  side  of  Figure  C.2  are  equivalent 
to  the  original  terrain.  In  particular,  the  terrain  in  the  right  bottom  is  generated  with 
double-sloped  terrain  model.  In  addition  to  the  sloped  terrain  of  Figure  C.la,  this  model 
has  an  extra  dimension  which  stores  the  slope  experienced  at  impact.  Adopting  double- 
sloped  terrain  is  a  topic  of  future  work. 


Figure  C.2:  Higher  dimensional  terrain  disturbances 
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Appendix  D 

Extended  Hybrid  Zero  Dynamics 


Any  one  of  the  numerous  control  schemes  proposed  for  bipedal  locomotion  can  be  opti¬ 
mized  and  benchmarked  using  the  methods  presented  in  this  thesis.  As  case  demonstra¬ 
tion,  we  illustrate  our  results  mainly  on  the  extended  hybrid  zero  dynamics  framework 
explained  in  this  Appendix. 

In  the  literature  of  bipedal  locomotion  control,  typically  time  trajectories  are  imposed 
to  achieve  a  desired  walking  motion.  However,  time-invariant  controllers  have  critical 
advantages  over  their  time-varying  counterparts.  For  example,  imagine  someone  holding 
the  robot  in  the  mid-step  for  a  second.  After  released,  instead  of  hurrying  to  catch  up 
the  time  trajectories  for  re-synchronization,  the  robot  better  continue  the  interrupted 
motion.  For  this  reason,  defining  an  internal  clock  which  defines  how  far  the  robot  is 
at  a  step  is  a  meaningful  approach,  which  has  led  to  the  hybrid  zero  dynamics  (HZD) 
framework  [Westervelt  et  ah,  2003].  The  controller  scheme  introduced  in  this  thesis  is 
a  generalization  of  the  now-famous  HZD  control  in  two-ways.  First,  we  modify  it  for 
non-flat  terrain  as  in  [Yadukumar  et  ah,  2012]  and  [Saglam  and  Byl,  2015].  Secondly, 
and  much  more  importantly,  HZD  controllers  assume  deterministic  terrain  by  design, 
where  the  internal  clock  ticks  from  0  to  1  at  every  step.  In  this  thesis,  we  extend  the 
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trajectories  to  cover  from  0  to  1.4,  thus  the  extended  hybrid  zero  dynamics  (EHZD).  The 
start  and  end  points  were  empirically  chosen  and  can  be  extended  further  in  both  ways 
when  necessary.  Finally,  we  use  B-splincs  instead  of  Bezier  polynomials  just  as  a  design 
choice.  For  optimization  and  benchmark  results  of  this  strategy  along  with  the  original 
HZD  controller  and  a  piecewise  reference  trajectories  using  a  sliding  mode  control,  see 
Chapter  4. 


D.l  The  Structure 


First,  we  choose  a  phase-variable  which  will  work  as  an  internal-clock.  The  phase 
should  be  monotonic  through  the  step,  e.g.,  hip  location  moves  forward  with  an  al¬ 
most  constant  velocity  in  humans  [Ames,  2012],  As  in  [Westervelt  et  ah,  2007],  we 
choose  9  shown  in  Figure  D.l  as  our  phase  variable,  which  corresponds  9  =  cq  with 
c  =  [—1  0  —  1/2  0—1]  since  femur  and  tip  lengths  are  equal  in  RABBIT  robot. 


Figure  D.l:  Phase  9  and  swing  tip  height  sh.  Since  femur  and  tip  lengths  are  equal 
in  RABBIT  robot,  9  =  cq  with  c  =  [—10  — 1/20  —  1],  The  robot  experiences  an 
impact  whenever  sh  =  0  and  sh  <  0,  i.e. ,  swing  tip  hits  the  ground  from  above. 


95 


Extended  Hybrid  Zero  Dynamics 


Chapter  D 


Second,  we  decide  on  four  independent  variables  to  control  h0  because  the  robot  has 
four  actuators  only.  It  is  intuitive  to  control  the  relative  (internal)  angles,  i.e.,  ho 
[q\  q-2  q2  q-s  q^ •  Then  ho  is  in  the  form  of  ho  =  H0  q,  where  H0  =  [I4  0].  Changing  h0 
later  is  an  easy  step  in  this  framework  and  it  does  not  affect  the  performance  to  the 
author’s  experience. 

Third,  let  hd(6)  be  the  references  for  ho-  Then  the  tracking  error  is  given  by 

h(q)  :=  h0(q )  -  hd(6)  =  H0g  -  hd(cq).  (D.l) 


Taking  the  first  derivative  with  respect  to  time  yields 

h=txi=txS{x)  =:  £/A=<Vft./>. 
dh 


(D.2) 


where  we  used  the  fact  that  —g(x)  =  0.  For  the  clarity  of  later  equations,  we  will  use 

ux 

the  Lie  derivative  (£)  notation.  Then,  we  have 


h  =  Cjh  +  CgCfh  u. 


(D.3) 


Substituting  the  linearizing  controller  structure 


u  =  (CgCfh )  1(—C2fh  +  v) 


(D.4) 


to  (D.3)  yields 

h  =  v. 


(D.5) 


There  are  various  methods  to  design  v  in  (D.4)  to  force  h  (and  h)  to  zero  [Ames  et  ah, 
2012],  While  even  a  PD  controller  could  be  sufficient,  a  sliding  mode  control  (SMC)  is 
preferable,  due  to  its  finite  time  convergence  [Sabanovic  and  Ohnishi,  2011],  which  we 
summarize  next. 
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D.2  Sliding  Mode  Control 

Remember  that  h  corresponds  to  tracking  error.  The  generalized  error  is  defined  as 

Gi  =  hi  +  hi/n,  *  =  {1,2,  3, 4},  (D.6) 

where  r^s  are  time  constants  for  ht.  Note  that  when  the  generalized  error  is  driven  to 
zero,  i.e.  at  =  0,  we  have 

0  =  hi  +  hi/ri .  (D.7) 

The  solution  to  this  equation  is  given  by 

hi(t)  =  hi(t0 )  exp(—(t  -  t0)/Ti),  (D.8) 

which  drives  hi  to  0  exponentially  fast.  Finally,  v  in  (D.4)  is  given  by 

Vi  =  -kilai^-'sign^Gi),  *  =  {1,2, 3, 4},  (D.9) 

where  ki  >  0  and  0.5  <  on  <  1  are  called  the  convergence  coefficient  and  convergence 
exponent  respectively.  Note  that  if  we  had  cp  =  1,  this  would  simply  be  a  standard  PD 
controller.  Then,  /ci/r*  and  ki  are  analogous  to  the  proportional  and  derivative  gains  of 
a  PD  controller.  However,  0.5  <  a*  <  1  ensures  finite  time  convergence.  In  this  thesis  we 
used  a*  =  0.75,  r*  =  0.1,  ki  =  10  for  all  controllers,  but  hd  was  different  for  each  low-level 
controller. 

What  the  controller  does  is  driving  the  system  to  the  “zero  dynamics  manifold”, 
noted  by  Z,  where  h  =  0  and  h  =  0.  The  following  two  sections  consider  the  dynamics 
on  this  manifold.  The  final  goal  is  to  design  hd  such  that  the  desired  walking  motion  is 
generated. 
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D.3  Swing  Phase  Zero  Dynamics 


Let  V  be  the  potential  energy  of  the  robot,  70(g)  be  the  last  row  of  D  and  define 

dV 

7  :=  7o Q-  Then,  C„ 7  =  0  and  Cf'y  =  —  — —  .  For  x  E  Z  we  transform  coordinates  by 

dq5  z 


6  =  0,  6  =  7, 


(D.10) 


to  equivalently  express  the  system  dynamics  during  the  swing  phase  as 


6  =  cq,  6  =  Cf'y, 


(D.ll) 


where  the  right-hand  sides  are  evaluated  at 


q  =  H- 


(because  =  9  =  cq  and  H0g  =  h0  =  hrj  for  x  e  Z), 


(D.12) 


rdhi  - 1  r  1 

v  0 

and  q  =  0<1  (because  h  =  0  for  x  G  Z  and  £2=7  =  7o(?)- 

7o  ^2 


To  evaluate  q,  we  can  alternatively  differentiate  q  from  above  to  get 

~d  hd' 

q  =  H”1  99  0 


(D.13) 


Using  (D.12)  we  can  write  (D.ll)  as 


ii  =  « 1(6)6,  6  =  «2(6)- 


(D.14) 
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D.4  Hybrid  Zero  Dynamics  Impact 


As  we  will  see,  h,j  will  be  designed  such  that  x~  G  Z  implies  x+  G  Z.  Now  assume 
the  swing  foot  passed  the  stance  foot  and  had  an  impact.  Let  q^  denote  the  angles  just 
before  the  impact  on  h(j.  Then,  q jj"  is  a  solution  to 


%) 

sh 


=  0, 


(D.15) 


where  sh  is  the  swing  tip  height  as  drawn  in  Figure  D.l.  At  the  impact,  the  angles 
are  relabeled  as  in  (B.3).  So,  after  the  impact  we  have  q jj"  =  A9gQ  .  The  phases  at  the 
beginning  and  end  of  the  step  are  simply  0+  :=  cq$  and  9~  :=  cq$  respectively. 

Similarly  let  %  and  q jj"  denote  the  velocities  just  before  and  after  the  impact  respec¬ 
tively.  £2  and  £2  are  the  corresponding  £2  values.  Then,  using  (D.12)  we  get 


(D.16) 


"  dh  .  , 

-1 

-  - 

%  = 

0 

7o(?o) 

1 

£2  £2 

From  (B.3),  we  obtain  q jj"  =  A ^(rfo  )  %  and  =  7o(?o")  qt ■  Defining 


$zero  •  7o(0'o  )  A<j((/o  )  A<j, 

the  impact  map  on  zero  dynamics  manifold  is  given  by 

Zf  =  0+,  &  =  SZero&- 


(D.17) 


(D.18) 


D.5  Poincare  Analysis 

We  then  define  a  Poincare  section  just  before  the  impact.  We  already  know  the  angles 
are  q^ ■  To  find  the  fixed  point  on  this  Poincre  map  (if  it  exists),  we  also  need  a  fixed 
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£2  .  Define  £2b  :=  ff/2.  &b  and  £2b  denote  its  value  at  the  beginning  and  end  of  the  step 
respectively.  Then,  we  have 


d&b  k  2(6 


d£i  «u(6)' 


Integration  over  a  step  gives 


, ~  _ ,  +  _  r 

?2S  «.(6)  6' 

Since  impact  maps  £2 b  to  d^ero^2 b,  the  fixed  point  £26  is  obtained  by  solving 


r '0 


F*  —  F*  = 

F<2b  °zero>2  b 


«2(f 


Jd+  «l(6 

Then,  the  fixed  points  for  the  zero  dynamics  are 


£  =  0-,  C2  = 


-2 


1  -  82 


r 

10+  (<Cl 


(D.19) 


(D.20) 


(D.21) 


(D.22) 


D.6  Under  deterministic  conditions 

[Westervelt  et  al.,  2007]  carries  a  Poincare  analysis,  where  the  stability  is  formulated 
analytically  and  shown  numerically.  Using  the  fixed  point  as  initial  condition,  they  simu¬ 
late  the  2D  dynamics  described  in  (D.ll)  until  9  =  6~  and  calculate  the  steady-state  cost 
of  transport  (COT)  defined  in  (4.1).  However,  we  demonstrate  in  Figure  4.1  that  opti¬ 
mizing  for  energy  efficiency  results  in  sensitivity  to  perturbations,  thus  poor  performance 
on  rough  terrain. 

During  optimization,  [Westervelt  et  ah,  2003]  impose  many  constraints  such  as  fric¬ 
tion  violations  and  torque  saturation  limits.  On  the  other  hand,  our  method  deals  with 
stochastic  conditions  to  maximize  global  stability  and  no  constraints  are  necessary  for 
optimization  because  it  considers  the  full  dynamics  where,  for  example,  the  controller 
can  hit  the  saturation  limit. 
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D.7  Reference  Design 


While  forming  trajectories  in  phase  instead  of  time,  we  first  scale  and  shift  6  to  have 
an  internal  clock  which  ticks  from  0  to  1  on  the  limit  cycle  using 


,  ,  %)  -  s+ 
T{q)  := 

Then,  the  reference  trajectory  is  determined  as 


(D.23) 


hd(9)  := 


bi(r) 
b2(r) 
h  (r) 
h  (r) 


(DM) 


To  form  trajectories  defined  by  bj(r) s,  we  use  B-splines  as  formulated  in  [Patrikalakis 
and  Maekawa,  2010].  Each  of  the  four  trajectories  is  then  defined  as  a  linear  combination 
of  control  points  a {  and  B-spline  basis  functions  N^r)  given  by 


bj(r)  =  5Z  aiNiAr)i  n>k-  1,  re  [rfc_  i,  rn+1\, 


(D.25) 


i=0 


where  k  is  order  of  the  curve  and  the  number  of  control  points  equals  n  +  1 .  The  basis 
functions  are  uniquely  determined  by  the  knot  vector 


T  ( To ,  T\ ,  •  • . ,  Tfc  —  1?  ^~ki  'I'k+li  Tn—h  T~ni  1~n+  lj  •••;  ’bi+fe), 


(D.26) 


which  consist  of  n  +  k  +  1  elements.  We  use  4th  order  curves  on  the  knot  vector 


T  =  (0,  0,  0,  0, 0.2,  0.4,  0.6,  0.8, 1, 1.2, 1.4, 1.4, 1.4, 1.4).  (D.27) 

The  resulting  10  basis  functions  and  their  derivatives  are  pictured  in  Figure  D.2. 

10  basis  functions  imply  10  control  points  for  each  of  the  4  bj(r) s.  So,  10  x  4  =  40 
a\  parameters  need  to  be  optimized.  However,  we  demand  hybrid  invariance,  so  x~  G  Z 


101 


Extended  Hybrid  Zero  Dynamics 


Chapter  D 


(a)  Basis  functions 


Figure  D.2:  10  basis  functions  of  order  4  with  uniform  knot  vector  in  (D.27)  and  their 
derivatives.  On  the  limit  cycle,  the  internal  clock  ticks  from  r  =  0  to  r  =  1.  We  extend 
the  knots  until  r  =  1.4  for  impacts  on  uneven  terrain. 


102 


on  the  limit  cycle  should  imply  x+  G  Z ,  i.e.,  once  the  walker  is  on  the  zero  dynamics 
manifold  it  should  stay  there  under  deterministic  conditions.  Hybrid  invariance  constraint 
eliminates  2  x  4  =  8  of  the  40  parameters.  This  is  achieved  by  calculating  aJ0  and  a{  in 
terms  of  the  other  aj  values. 

First,  assume  h  —  0  at  the  impact  when  r  =  1.  Note  that  hd\T=1  is  independent  from 
aJ0  and  a\  (see  Figure  D.2a).  Also  hd\T=0  is  a  function  of  aJ0  only.  Achieving  h  —  0  after 
the  impact  requires 


Qo  =  Ag% 


H 


-i 


^d\T=o 

e + 


k 


<Ut=0 

0+ 


A  qH-1 


=  HAgH-1 


hd  lr=i 
6~ 

hd\T=l 

e~ 


(D.28) 


which  determines  aJ0. 


Second,  assume  h  —  0  at  the  impact  with  r  =  1.  Again, 


dhn 


d0 


from  «q  and  a{.  Also 


d  hn 


is  independent 


89 


T= 0 


is  a  function  of  these  two  (see  Figure  D.2b),  one  which 


T=  1 


we  already  determined  above.  Now  remember  (D.13).  Achieving  h  =  0  after  the  impact 


means 


Qo  =  A q(q0  )  %  , 


(D.29) 


or  equivalently 


~dhd 

- 

~dhd 

- 

H -1 

de 

1 

T  — 0 

r  =  Aj(,0-)  h-' 

89 

1 

T=  1 

(D.30) 


from  which  we  calculate  a(.  As  a  result,  32  a\  parameters  that  need  to  be  optimized 
are  given  by  %  G  {2, 3, 4,  5,  6,  7,  8,  9}  and  j  G  {1,  2,  3, 4}.  For  optimization  and  benchmark 
results  of  this  controller  scheme  see  Chapter  4. 
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